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METHODS FOR ANALYSIS OF SPECTRAL DATA AND THEIR APPLICATIONS: 

OSTEOPOROSIS 

RELATED APPLICATIONS 

This application Is related to (and where permitted by law, claims priority to): 

(a) United Kingdom patent application GB 0109930.8 filed 23 April 2001 ; 

(b) United Kingdom patent appHcation GB Oil 7428.3 filed 17 July 2001 ; 

(c) United States Provisional patent application USSN 60/307,01 5 filed 20 July 2001; 
the contents of each of which are Incorporated herein by reference in their entirety. 

This application is one of five applications filed on even date naming the same applicant 

(1) attorney reference number WJW/LP5995600 (PCT/GB02/ ); 

(2) attorney reference number WJW/LP599561 8 (PCT/GB02/ ); 

(3) attorney reference number WJW/LP59g5626 (PCT/GB02/ ); 

(4) attorney reference number WJW/LP5995634 (PCT/GB02/ ); 

(5) attorney reference number WJW/LP5995642 {PCT/GB02/ ); 

the contents of each of which are incorporated herein by reference in their entirety. 

TECHNICAL FIELD 

This Invention pertains generally to the field of metabonomlcs. and, more particularly, to 
chemometric methods for the analysis of chemical, biochemical, and biological data, for 
example, spectral data, for example, nuclear magnetic resonance (NMR) spectra, and 
their applications, Including, e.g., classification, diagnosis, prognosis, etc.. especially In 
the context of bone disorders, e.g., conditions associated with low bone mineral density, 
e.g., osteoporosis. 

BACKGROUND 

Throughout this specification, including the claims which follow, unless the context 
requires othenrtrtse, the word "comprise," and variations such as "comprises" and 
"comprising," will be understood to Imply the Inclusion of a stated integer or step or group 
of Integers or steps but not the exclusion of any other Integer or step or group of integers 
or steps. 
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It must be noted that, as used in the specification and the appended claims, the singular 
fonns "a." "an," and 'the" include plural referents unless the context clearly dictates 
othenivise. 

Ranges are often expressed herein as from "about" one particular value, and/or to 
"about" another particular value. When such a range is expressed, another embodiment 
Includes from the one particular value and/or to the other particular value. Similarly, 
when values are expressed as approximations, by the use of the antecedent "about," it 
will be understood that the particular value fonns another embodiment 



Biosystems can conveniently be viewed at several levels of bio-molecular organisation 
based on biochemistry, i.e.. genetic and gene expression (genomic and transcriptomic). 
15 protein and signalling (proteomic) and metabolic control and regulation (metabonomic). 
There are also Important cellular ionic regulation variations that relate to genetic, 
proteomic and metabolic activities, and systematic studies on these even at the cellular 
and sub-cellular level should also be investigated to complete the full description of the 
bio-molecular oiganisation of a blo-system. 



Significant progress has been made in developing methods to determine and quantify 
the biochemical processes occum'ng in living systems. Such methods are valuable in 
the diagnosis, prognosis and treatment of disease, the development of drugs, for 
Improving therapeutic regimes for current drugs, and the like. 



Many diseases of the human or animal body (such as cancers, degenerative diseases, 
autoimmune diseases and the like) have an underlying basis in alterations in the 
iexpression of certain genes. The expressed gene products, proteins, mediate effects 
such as abnomnal cell growth, cell death or inflammation. Some of these effects are 
30 caused directly by protein^proteln Interactions: other are caused by proteins acting on 
small molecules (e.g. "second messengers") which trigger effects Including further gene 
expression. 



35 



Ukewise, disease states caused by external agents such as viruses and bacteria 
provoke a multitude of complex responses in infected host. 
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In a similar manner, the treatment of disease through the administration of dnigs can 
result in a wide range of desired effects and unwanted side effects in a patient. 

In recent years. It has been appreciated that the reaction of human and animal subjects 
5 to disease and treatments for them can vary according to the genomic makeup of an 
individual. This has led to the development of the field of "pharmacogenomics." A fuller 
understanding of how an individuars own genome reacts to a particular disease and/or 
drug treatment will allow the development of new therapies, as well as the refinement of 
existing ones. 

10 

At the genetic level, methods for examining gene expression in response to these types 
of events are often referred to as "genomic methods," and are concerned with the 
detection and quantification of the expression of an organism's genes, collecb'vely 
refenred to as its "genome," usually by detecting and/or quantHying genetic molecules, 
15 such as DNA and Rl^iA. Genomic studies often exploit proprietary "gene chips," which 
are small disposable devices encoded with an array of genes that respond to extracted 
mRNAs produced by cells (see, for example, Klenk et a!.. 1997). Many genes can be 
placed on a chip an^y and patterns of gene expression, or changes therein, can be 
monitored rapidly, although at some considerable cost. 

20 

However, the biological consequences of gene expression, or altered gene expression 
following perturbation, are extremely complex. This has led to the development of 
"proteomic methods" which are concerned with the semi-quantitative measurement of 
the production of cellular proteins of an organism, collectively referred to as its 
25 "proteome" (see, for sample, Geisow, 1998). Proteomic measurements utilise a variety 
of technologies, but all involve a protein separation method, e.g., 2D gel-electrophoreisis, 
allied to a chemical characterisation method, usually, some form of mass spectrometry. 

At present, genomic methods have a high associated operational cost and proteomic 
30 methods require investment in expensive capital cost equipment and are labour 
intensive, but both have the potential to be powerful tools for studying biological 
response. The choice of method is still uncertain since careful studies have sometimes 
shown a low conrelation between the pattern of gene expression and the pattern of 
protein expression, probably due to sampling for the two technologies at inappropriate 
35 time points. See, ag., Gygi et al., 1999. Even in combination, genomic and proteomic 
methods still do not provide the range of information needed for understanding 
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integrated cellular function in a living system, since they do not take account of the 
dynamic metabolic status of the whole organism. 

For example, genomic and proteomic studies may implicate a particular gene or protein 
5 in a disease or a xenobiotic response because the level of expression is altered, but the 
change in gene or protein level may be transitory or may be counteracted downstream 
and as a result there may be no effect at the cellular and/or biochemical level. 
Conversely, sampling tissue for genomic and proteomic studies at inappropriate time 
points may result In a relevant gene or protein being overlooked. 

10 

Gene-based prognosis has yet to become a clinical reality for any major prevalent 
disease, almost all of which have multi-gene modes of inheritance and significant 
environmental impact making it difficult to identify the gene panels responsible for 
susceptibility. 

15 

While genomic and proteomic methods may be useful aids, for example, in drug 
development, they do suffer from substantial limitations. For example, while genomic 
and proteomic methods may ultimately give profound insights Into toxicological 
mechanisms and provide new sunogate biomarkers of disease, at present it is very 

20 difficult to relate genomic and proteomic findings to classical cellular or biochemical 
indices or endpoints. One simple reason for this is that with cunetit technology and 
approach, the con-elation of the time-response to drug exposure is difficult Further 
difficulties arise with in vitro cell-based studies. These difficulties are parficulariy 
important for the many known cases where the metabolism of the compound is a 

25 prerequisite for a toxic effect and especially taie where the target organ is not the site of 
primary metabolism. This is particulariy true for pro-drugs, where some aspect of in situ 
chemical {e.g., enzymatic) modification is required for activity. 

Metabonomics 

30 

A new "metabonomltf • approach has been developed which is aimed at augmenting and 
complementing the Infomiation provided by genomfcs and proteomlcs. "Metabonomics" 
Is conventionally defined as "the quantitative measurement of the mulflparametric 
metabolic response of living systems to pattiophyslologlcai stimuli or gen^ 
35 modification" (see. for example, Nicholson et al., 1999). This concept has arisen 
primarily from tiie application of NMR spectroscopy to study ttie m^bollc 
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composition of biofluids, cells, and tissues and from studies utilising pattern recognition 
(PR), expert systems and other chemoinfomiatic tools to interpret and classify complex 
NMR-generated metabolic data sets. Metabonomic methods have the potential, 
ultimately, to determine the entire dynamic metabolic make-up of an organism. 

5 

As outlined above, each level of bio-molecular organisation requires a series of analytical 
bio-technologies appropriate to the recovery of the individual types of bio-molecular data. 
Genomic, proteomic and metabonomic technologies by definition generate massive data 
sets which require appropriate multi-variate statistical tools (chemometrics, bio- 
10 informatics) for data mining and to extract useful biological infomiation. These data 
exploration toots also allow the inter-relationships between multivariate data sets from 
the different technologies to be investigated, they facilitate dimension reduction and 
extraction of latent properties and allow multidimensional visualization. 

15 This leads to the concept of "bionomics", the quantitative measurement and 

understanding of the integrated function (and dysfunction)of biological systems at all 
major levels of bio-molecular organisation. In the study of altered gene expression, 
(known as transcriptomics), the variables are mRNA responses measured using gene 
chips, in proteomics, protein synthesis and asoclated post-translational modifications are 

20 typically measured using (mainly) gel-electrophoresis coupled to mass spectrometry. In 
both cases, thousands of variables can be measured and related to biological end-points 
using statistical methods. In metabolic (metabonomic) studies, only NMR (especially ^H) 
and mass spectrometry has been used to provide this level of data density on bio- 
materials although these data can be supplemented by conventional biochemical 

25 assays. 

For in vivo mammalian studies, the ability to perfonn metabonomic studies on biofluids 
such as plasma, CSF and urine is very important because ft gives integrated systems- 
based infomnation on the whole organism. Furthermore, in clinical settings, for the full 
30 utilization of functional genomic knowledge in patient screening, diagnostics and 
prognostics, it is much more practical and ethically-acceptable to analyze biofluld 
samples than to perform human tissue biopsies and measure gene responses. 

A pathological condition or a xenobiotic may act at the pharmacological level only and 
35 hence may not affect gene regulation or expression directly. Alternatively significant 
disease or toxicological effects may be completely unrelated to gene switching. For 
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example, exposure to ethanol In vivo may cause many changes in gene expression but 
none of these events explains dmnlcenness. In cases such as these, genomic and 
proteomic methods are likely to be ineffective. However, all disease or drug-induced 
pathophysiological perturbations result in disturbances in the ratios and concentrations. 
5 binding or fluxes of endogenous biochemicals. either by direct chemical reaction or by 
binding to key enzymes or nucleic acids that control metabolism. If these disturbances 
are of sufficient magnitude, effects will result which will affect the efficient functioning of 
the whole organism. In body fluids, metabolites are in dynamic equilibrium with those 
inside cells and tissues and, consequently, abnomial cellular processes in tissues of the 
10 whole organism following a toxic insult or as a consequence of disease wiii be reflected 
in altered biofluid compositions. 

Fluids secreted, excreted, or othenwise derived from an organism ("biofluids") provide a 
unique window into its biochemical status since the composition of a given biofluid is a 

15 consequence of the function of the cells that are intimately concerned with the fluid's 
manufacture and secretion. For example, tfie composition of a particular fluid (e.g., 
urine, blood plasma, milk, etc.) can canry biochemical infonnation on details of organ 
function (or dysfunction), for example, as a result of xenobiotlcs. disease, and/or genetic 
modification. Similariy. the composition and condition of an organism's tissues are also 

20 Indicators of the organism's biochemical status. 

In general, a xenobiotic is a substance (e.g., compound, composition) which is 
administered to an organism, or to which the organism is exposed. In general, 
xenobiotics are chemical, biochemical or biological species (e.g., compounds) which are 

25 not normally present In that organism, or are nonmally present in that organism, but not 
at the level obtained following administration/ exposure. Examples of xenobiotlcs include 
dmgs, formulated medicines and their components (e.g., vaccines, immunological 
stimulants, inert cam'er vehicles), infectious agents, pestiddes, hert^icides, substances 
present In foods (e.g. plant compounds administered to animals), and substances 

30 present In the environment. 

In general, a disease state pertains to a deviation from the normal healthy state of the 
organism. Examples of disease states fndude. but are not limited to, bacterial, viral, and 
parasitic infections; cancer in all its foms; degenerative diseases (e.g., arthritis, multiple 
35 sclerosis): trauma (e.g., as a result of injury); organ failure Oncluding diabetes); 
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cardiovascular disease (e.g., atherosclerosis, thrombosis); and, inherited diseases 
caused by genetic composition (e.g., sickle-cell anaemia). 

In general, a genetic modification pertains to alteration of the genetic composition of an 
5 organism. Examples of genetic modifications include, but are not limited to: the 

incorporation of a gene or genes into an organism from another species; increasing the 
number of copies of an existing gene or genes in an organism; removal of a gene or 
genes from an organism; and, rendering a gene or genes in an organism non-functional. 

10 Biofluids often exhibit very subtle changes in metabolite profile in response to external 
stimuli. This is because the body's cellular systems attempt to maintain homeostasis 
(constancy of internal environment), for example, in the face of cytotoxic challenge. One 
means of achieving this is to modulate the composition of biofluids. Hence, even when 
cellular homeostasis is maintained, subtle responses to disease or toxicity are expressed 

15 in altered biofiuid composition. However, dietary, diurnal and homnonal variations may 
also influence biofiuid compositions, and it is clearly important to differentiate these 
effects if connect biochemical inferences are to be drawn from their analysis. 

Metabonomics offers a number of distinct advantages (over genomics and proteomics) in 
20 a clinical setting: firstly, it can often be perfomied on standard preparations (e.g., of 

serum, plasma, urine, etc.), circumventing the need for specialist preparations of cellular 
RNA and protein required for genomics and proteomics. respectively. Secondly, many of 
the risk factors already identified (e.g., levels of various lipids in blood) are small 
molecule metabolites which will contribute to the metabonomic dataset. 

25 

APDlication of NMR to Metabonomics 

One of the most successful approaches to biofiuid analysis has been the use of NMR 
spectroscopy (see, for example, Nicholson et al., 1989); similariy. intact tissues have 
30 been successfully analysed using magic*angle-spinning ^H NMR spectroscopy (see, for 
example, Moka et al., 1998; Tomlins et al., 1998). 

The NMR spectrum of a biofiuid provides a metabolic fingerprint or profile of the 
oiganism from which the biofiuid was obtained, and this metabolic fingerprint or profile is 
35 characteristically changed by a disease, toxic process, or genetic modification. For 
example, NMR spectra may be collected for various states of an organism (e.g., pre- 
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dose and various times post-dose, for one or more xenobiotics. separately or in 
combination; healthy (control) and diseased animal; unmodified (control) and genetically 
modified animaQ. 

For example, In the evaluation of undesired toxic side-effects of dmgs, each compound 
or dass of compound produces characteristic changes in the concentrations and 
patterns of endogenous metabolites in biofluids that provide infonnatlon on the sites and 
basic mechanisms of the toxic process. NMR analysis of biofluids has successfully 
uncovered novel metabolic markers of organ-specific toxicity in the laboratory rat, and it 
is in this "exploratory" rote that NMR as an analytical biochemistry technique excels. 
However, the biomarker infonnation in NMR spectra of biofluids is very subtle, as 
hundreds of compounds representing many pathways can often be measured 
simultaneously, and it is this overall metabonomic response to toxic insult that so well 
characterises the lesion. 

Another important advantage of NMR-based metabonomics over genomics or 
proteomics is the intrinsic analytical accuracy of NMR spectroscopy. Reanalysis of the 
same sample by 1H NMR spectroscopy results in a typical coefficient of variation for the 
measurement of peak intensities In a spectrum of less than 5% across the whole range 
of peaks. Thus if the appropriate experiments are undertaken, on average the value of 
each peak intensity will lie in the range 0.95 to 1.05 of the true value. In addition, tt is 
possible using NMR spectroscopy to measure absolute amounts or concentrations of a 
number of analytes whereas using gene chip technology only fold changes can be 
detemiined. The best available accuracy achieved using gene chips is a two fold 
change, i.e., the value for each parameter lies in the range 0.50 to 2.00 fold of the '^e" 
value) and proteomic technology is even less Intrinsically accurate. A similar limitation 
also applies to proteomic studies. 

Although, undoubtedly, technology is improving at a rapid rate the gap between the 
Intrinsic accuracies of NMR spectroscopy and gene chip technology Is so wide that it will 
require a revolutionary rather than evolufionary Improvement In gene expression 
quantification methodology before it can rival the accuracy of NMR spectroscopy. 

The intrinsic accuracy of NMR provides a distinct advantage when applying pattern 
recognition techniques. The multivariate nature of the NMR data means that 
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classification of samples is possible using a combination of descriptors even when one 
descriptor is not sufficient, because of the inherently low analytical variation in the data. 

Ail biological fluids and tissues have their own characteristic physico-chemical 
5 properties, and these affect the types of NMR experiment that may be usefully 
employed. One major advantage of using NMR spectroscopy to study complex 
biomixtures is that measurements can often be made with minimal sample preparation 
(usually with only the addition of 5-10% D2O) and a detailed analytical profile can be 
obtained on the whole biological sample. Sample volumes are small, typically 0.3 to 0.5 
10 mL for standard probes, and as low as 3 pL for microprobes. Acquisition of simple NMR 
spectra is rapid and efficient using flow-injection technology. It is usually necessary to 
suppress the water NMR resonance. 

Many biofluids are not chemically stable and for this reason care should be taken in their 
15 collection and storage. For example, cell lysis in erythrocytes can easily occur. If a 
substantial amount of D2O has been added, then it is possible that certain NMR 
resonances will be lost by H/D exchange. Freeze-drying of biofluid samples also causes 
the loss of volatile components such as acetone. Biofluids are also very prone to 
microbiological contamination, especially fluids, such as urine, which are difficult to 
20 collect under sterile conditions. Many biofluids contain significant amounts of active 
enzymes, either nonmaify or due to a disease state or organ damage, and these 
enzymes may alter the composition of the biofluid following sampling. Samples should 
be stored deep frozen to minimise the effects of such contamination. Sodium azide is 
usually added to urine at the collection point to act as an antimicrobial agent Metal ions 
25 and or chelating agents (e.g., EDTA) may be added to bind to endogenous metal ions 
(e.g., Ca^*, Mg^ and Zn^ and chelating agents (e.g., fl^e amino adds, especially 
glutamate, cysteine, histidine and aspartate; citrate) to intentionally alter and/or enhance 
the NMR spectrum. 

30 In all cases the analytical problem usually involves the detection of '^ce" amounts of 
analytes in a very complex matrix of potential interferences. It is, therefore, critical to 
choose a suitable analytical technique for the particular class of analyte of interest in the 
particular blomatrix which could be, for example, a biofluid or a tissue. High resolution 
NMR spectroscopy (in particular NMR) appears to be particularly appropriate, the 

35 main advantages of using NMR spectroscopy in this area are the speed of the 
method (with spectra being obtained in 5 to 10 minutes), the requirement for minimal 
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sample preparation, and the fact that it provides a non-selective detector for all 
metabolites in the bioffluld regardless of their stmctural type, provided only that they are 
present above the detection limit of the NMR experiment and that they contain non- 
exchangeable hydrogen atoms. The speed advantage is of crucial importance in this 
area of wori« as the clinical condition of a patient may require rapid diagnosis, and can 
change very rapidly and so comespondingly rapid changes must be made to the therapy 
provided. 

NIVIR studies of body fluids should ideally be perfomned at the highest magnetic field 
available to obtain maximal dispersion and sensitivity and most NMR studies have 
been perfomned at 400 MHz or greater. With every new increase in available 
spectrometer frequency the number of resonances that can be resolved in a biofluid 
increases and although this has the effect of solving some assignment problems, it also 
poses new ones. Furthennore, there are still important problems of spectral 
Interpretation that arise due to compartmentation and binding of small molecules in the 
organised macromolecular domains that exist in some biofluids such as blood plasma 
and bile. All this complexity need not reduce the diagnostic capabilities and potential of 
the technique, but demonstrates the problems of biological variation and the Influence of 
variation on diagnostic certainty. 

The infomoation content of biofluid spectra is very high and the corriplete assignment of 
the NMR spectrum of most biofluids is usually not possible (even using 900 MHz 
NMR spectroscopy). However, the assignment problems vary considerably between 
biofluid types. Some fluids have near constant composition and concentrations and in 
these the majority of the NMR signals have been assigned. In contrast, urine 
composition can be very variable and there Is enonnous variation in the concentration 
range of NMR-detectable metabolites; consequentiy, complete analysis is much more 
difficult Those metabolites present dose to the limits of detection for 1-dimensional (1 D) 
NMR spectroscopy (typically ca. 100 nM at 800 MHz) pose severe NMR spectral 
assignment problems. (In absolute temris, ttie detection limit may be ca. 4 nmol, eg., 1 
pg of a 250 g/mol compound in a 0.5 mL sample volume.) Even at the present level of 
technology in NMR, It Is not yet possible to detect many Important biochemical 
substances (e.g. homiones. some proteins, nucleic adds) in body fluids because of 
problems with sensitivity, line widths, dispersion and dynamic range and this area of 
research will continue to be technology-limited. In addition, tiie colledion of NMR 
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spectra of biofluids may be complicated by the relative water intensity, sample viscosity, 
protein content, lipid content, and low molecular weight peak overlap. 



Usually in order to assign NMR spectra, comparison is made with spectra of authentic 
5 materials and/or by standard addition of an authentic reference standard to the sample. 
Additional confirmation of assignments is usually sought from the application of other 
NMR methods, including, for example, 2-dimensional (2D) NMR methods, particularly 
COSY (correlation spectroscopy), TOCSY (total conrelatlon spectroscopy), 
inverse-detected heteronuciear correlation methods such as HMBC (heteronuclear 

10 multiple bond correlation), HSQC (heteronuclear single quantum coherence), and HMQC 
(heteronuclear multiple quantum coherence), 2D J-resolved (JRES) methods, spin-echo 
methods, relaxation editing, diffusion editing (e-g., both ID NMR and 2D NMR such as 
diffusion-edited TOCSY), and multiple quantum filtering. Detailed NMR spectroscopic 
data for a wide range of metabolites and biomolecules found in biofluids have been 

15 published (see. for example, LIndon et al., 1999) and supplementary information Is 
available in several literature compilations of data (see, for example, Fan, 1996; Sze et 
al., 1994). 

For example, the successful application of NMR spectroscopy of biofluids to study a 
20 variety of metabolic diseases and toxic processes has now been well established and 
many novel metabolic mariners of organ-specific toxicity have been discovered (see, for 
example, Nicholson et al., 1989; Lindon et al., 1999). For example, NMR spectra of 
urine is identifiably altered in situations where damage has occurred to the kidney or 
liver. It has been shown that specific and identifiable changes can be ot)sen/ed whidi 
25 distinguish the organ that Is the site of a toxic lesion. Also It is possible to focus in on 
particular parts of an organ such as the cortex of the kidney and even in favourable 
cases to very localised parts of the cortex. 

It is also possible to deduce the biochemical mechanism of the xenobiotic toxicity, based 
30 on a biochemical interpretation of the changes in the urine. A wide range of toxins has 
now been investigated including mostly kidney toxins and liver toxins, but also testicular 
toxins, mitochondrial toxins and muscle toxins. 
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Pattern Recognition 

However, a limiting factor in understanding the biochemical infonnation from both 1 D and 
2D-NMR spectra of tissues and biofluids is their complexity. The most efficient way to 
5 investigate these complex muitlparametric data is employ the ID and 2D NMR 

metabonomic approach in combination with computer-based "pattern recognition" (PR) 
methods and expert systems. These statistical tools are similar to those currently being 
explored by workers in the fields of genomics and proteomics. 

10 Pattern recognition (PR) methods can be used to reduce the complexity of data sets, to 
generate scientific hypotheses and to test hypotheses. In general, the use of pattern 
recognition algorithms allows the identification, and, with some methods, the 
interpretation of some non-random behaviour in a complex system which can be 
obscured by noise or random variations in the parameters defining the system. Also, the 

1 5 number of parameters used can be very large such that visualisation of the regularities, 
which for the human brain is best in no more than three dimensions, can be difficulL 
Usually the number of measured descriptors is much greater than three and so simple 
scatter plots cannot be used to visualise any similarity between samples. Pattern 
recognition methods have been used widely to characterise many different types of 

20 problem ranging for example over linguistics, fingerprinting, chemistry and psychology. 
In the context of the methods described herein, pattem recognition is the use of 
multivariate statistics, both parametric and non-parametric, to analyse spectroscopic 
data, and hence to classify samples and to predict the value of some dependent variable 
based on a range of observed measurements. There are two main approaches. One 

25 set of methods is temned "unsupervised" and these simply reduce data complexity in a 
rational way and also produce display plots which can be interpreted by the human eye. 
The other approach is termed "supenrised" whereby a training set of samples with known 
dass or outcome is used to produce a mathemat'cal model and this is then evaluated 
with independent validation data sets. 

30 

Unsupendsed PR methods are used to analyse data without reference to any other 
independent knowledge, for example, without regard to the identity or nature of a 
xenobiotic or its mode of action. Examples of unsupenrised pattem recognition methods 
include principal component analysis (PCA). hierarchical cluster analysis (HCA), and 
35 non-linear mapping (NLM). 
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One of the most useful and easily applied unsupervised PR techniques is principal 
components analysis (PCA) (see, for example, Kowalski et al, 1986). Principal 
components (PCs) are new variables created from linear combinations of the starting 
variables with appropriate weighting coefficients. The properties of these PCs are such 
5 that: (i) each PC is orthogonal to (uncorrelated with) all other PCs, and (ii) the first PC 
contains the largest part of the variance of the data set (information content) with 
subsequent PCs containing con-espondingly smaller amounts of variance. 

PCA, a dimension reduction technique, takes m objects or samples, each described by 

10 values in K dimensions (descriptor vectors), and extracts a set of eigenvectors, which 
are linear combinations of the descriptor vectors. The eigenvectors and eigenvalues are 
obtained by diagonalisation of the covariance matrix of the data. The eigenvectors can 
be thought of as a new set of orthogonal plotting axes, called principal components 
(PCs). The extraction of the systematic variations in the data is accomplished by 

15 projection and modelling of variance and covariance structure of the data matrix. The 
primary axis is a single eigenvector describing the largest variation in the data, and is 
temned principal component one (PC1). Subsequent PCs, ranked by decreasing 
eigenvalue, describe successively less variability. The variation in the data that has not 
been described by the PCs is called residual variance and signifies how well the model 

20 fits the data. The projections of the descriptor vectors onto the PCs are defined as 
scores, which reveal the relationships between the samples or objects. In a graphical 
representation (a "scores plof or eigenvector projection), objects or samples having 
similar descriptor vectors will group together in clusters. Another graphical representation 
is called a loadings plot, and this connects the PCs to the Individual descriptor vectors, 

25 and displays both the importance of each descriptor vector to the Interpretation of a PC 
and the relationship among descriptor vectors in that PC. In fact, a loading value is 
simply the cosine of the angle which the original descriptor vector makes with the PC. 
Descriptor vectors which fall close to the origin in this plot cany little information in the 
PC, while descriptor vectors distant from the origin (high loading) are important in 

30 interpretation. 

Thus a plot of the first two or three PC scores gives the "best" representation, in terms of 
Information content, of the data set in two or three dimensions, respectively. A plot of the 
first two principal component scores, PC1 and PC2 provides the maximum infonnatlon 
35 content of the data in two dimensions. Such PC maps can be used to visualise inherent 
clustering behaviour, for example, for drugs and toxins based on similarity of their 
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metabonomic responses and hence mechanism of action. Of course, the clustering 
Information might be in lower PCs and these have also to be examined. 

Hierarchical Cluster Analysis, another unsupenrised pattern recognition method, pennits 
the grouping of data points which are similar by virtue of being "near" to one another in 
some multidimensional space. Individual data points may be, for example, the signal 
Intensities for particular assigned peaks In an NMR spectrum. A "similarity matrix." S. is 
constntcted with elements sg = 1 - ^^r^"^, where r| is the Interpolnt distance between 
points I and j (e.g., Euclidean Interpolnt distance), and r^"^ is the largest Interpolnt 
distance for all points. The most distant pair of points will have Sq equal to 0. since rq 
then equals rg"^. Conversely, the closest pair of points will have the largest Sq. For two 
identical points. Sg is 1. 

The similarity matrix is scanned for the closest pair of points. The pair of points are 
reported with their separation distance, and then the two points are deleted and replaced 
with a single combined point. The process is then repeated iteratively until only one 
point remains. A number of different methods may be used to detenfnine how two 
clusters will be joined, including the nearest neighbour method (also known as the single 
link method), the furthest neighbour method, and the centrold method (including centroid 
link. Incremental link, median link, group average link, and flexible link variations). 

The reported connectivities are then plotted as a dendrogram (a tree-like chart which 
allows visualisation of clustering), showing sample-sample connectivities versus 
Increasing separation distance (or equivalently, versus decreasing similarity). The 
dendrogram has the property in which the branch lengths are proportional to the 
distances between the various dusters and hence the length of the branches linking one 
sample to the next is a measure of their similarity. In this way, similar data points may 
be identified algorithmicaily. 

Non-linear mapping (NLM) is a simple concept which Invoh^es calculation of the 
distances between all of the points in the original K dimensions. This is followed by 
construction of a map of points In 2 or 3 dimensions where the sample points are placed 
in random positions or at values detennined by a prior prina'pai components analysis. 
The least squares criterion Is used to move the sample points In the lower dimension 
map to fit the inter-point distances in the lower dimension space to those In the K 
dimensional space. Non-linear mapping is therefore an approximation to the true Inter- 
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point distances, but points close in K-dimensional space should also be close in 2 or 3 
dimensional space (see, for example, Brown et al., 1996; Fanrant et al., 1992). 

In this simple metabonomic approach, a sample from an animal treated with a compound 
of unknown toxicity Is compared with a database of NMR-generated metabolic data from 
control and toxin-treated animals. By observing its position on the PR map relative to 
samples of known effect, the unknown toxin can often be classified. The same approach 
can be used for human samples for classification according to disease. However, such 
data are often more complex, with time-related biochemical changes detected by NMR. 
Also, it is more rigorous to compare effects of xenoblotics In the original K-dimensional 
NMR metabonomic space. 

Alternatively, and in order to develop automatic classification methods, it has proved 
efficient to use a ''supervised" approach to NMR data analysis. Here, a "training set" of 
NMR metabonomic data is used to construct a statistical model that predicts conectly the 
"class" of each sample. This training set is then tested with independent data (refen-ed 
to as a test or validation set) to determine the robustness of the computer-based model. 
These modiels are sometimes tenmed "expert systems," but may be based on a range of 
different mathematical procedures. Supervised methods can use a data set with 
reduced dimensionality (for example, the first few principal components), but typically 
use unreduced data, with all dimensionality. In all cases the methods allow the 
quantitative description of the multivariate boundaries that characterise and separate 
each dass, for example, each class of xenobiotic in temis of its metabolic efiiscts. It is 
also possible to obtain confidence limits on any predictions, for example, a level of 
probability to be placed on the goodness of fit (see, for example, Kowalski et al., 1986). 
The robustness of the predictive models can also be checked using cross-validation, by 
leaving out selected samples from the analysis. 

Expert systems may operate to generate a variety of useful outputs, for example, 
(i) classification of the sample as "normal" or "abnormal" (this is a useful tool in the 
control of spectrometer automation, e.g.. using sequential flow injection NMR 
spectroscopy); (ii) classification of the target organ for toxicity and site of action within 
the tissue where in certain cases, mechanism of toxic action may also be classified; and, 
(iiO identification of the biomaricers of a pathological disease condition or toxic effect for 
the particular compound under study. For example, a sample can be classified as 
belonging to a single class of toxicity, to multiple classes of toxicity (more than one target 
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organ), or to no dass. The latter case would indicate deviation from normality (control) 
based on the training set model but having a dissimilar metabolic effect to any toxicity 
dass modelled In the training set (unknown toxidty type). Under Oi), a system could also 
be generated to support decisions in clinical medicine (e.g., for efficacy of drugs) rather 
than toxicity. 

Examples of supervised pattern recognition methods indude the following: 

soft independent modelling of dass analysis (SIMCA) (see. for example. Wold. 

1976); 

partial least squares analysis (PLS) (see. for example, Wold, 1966; Joreskog, 
1982; Frank. 1984; Bro. R., 1997); 

linear descrlminant analysis (LDA) (see, for example. Nilison. 1965); 

K-nearest neighbour analysis (KNN) (see. for example. Brown et al.. 1996); 

artifidal neural networks (ANN) (see, for example, Wassemian, 1989; Anker et 
al., 1992; Hare, 1994); 

probabilistic neural networks (PNNs) (see. for example, Parzen, 1962; Bishop. 
1995; Speckt, 1990; Broomhead et al., 1988; Patterson, 1996); 

rule induction (RI) (see, for example. Quinlan, 1986); and, 

Bayesian methods (see. for example, Bretthorst, 1990a, 1990b. 1988). 

As the size of metabonomic databases increases together with improvements in rapid 
throughput of NMR samples (> 300 samples per day per spectrometer is now possible 
with the first generation of flow injection systems), more subtle expert systems may be 
necessary, for example, using techniques such as "fuzzy logic" which pemiit greater 
flexibility in dedslon boundaries. 

Application to Metabonomics 

Pattern recognition methods have been applied to the analysis of metabonomic data. 
See, for example, Undon et al., 2001. A number of spectroscopic technkjues have been 
used to generate the data, induding NMR spectroscopy and mass spectrometry. Pattern 
recognition analysis of such data sets has been succesful in some cases. The 
successful studies indude. for example, complex NMR data from biofluids. (see, for 
example. Anthony et al.. 1994; Anthony et al.. 1995; Beckwith-Hall et al.. 1998; Gartland 
et al.. 1990a; Gartland et al., 1990b; Gartland et al.. 1991; Holmes et al.. 1998a; Holmes 
et al., 1998b; Holmes et al., 1992; Hdmes et al., 1994; Spraul et al., 1994; Tranter et al., 
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1999) conventional NMR spectra from tissue samples (Somorjai et al., 1995), magic- 
angle-spinning ((WAS) NMR spectra of tissues (Garrod et al., 2001), in vivo NMR spectra 
(Morvan et al., 1990; Howells et al., 1993; Stoyanova et al., 1995; Kuesel et ai., 1996; 
Confort-Gouny et al., 1992; Weber et al., 1998), wines (Martin et al.. 1998, 1999) and 
5 plant tissues (Kopka et al., 2000). 

Although the utility of the metabonomic approach is well established, its full potential has 
not yet been exploited. The metabolic variation is often subtle, and powerful analysis 
methods are required for detection of particular analytes. especially when the data (e.g., 
10 NMR spectra) are so complex. For example, all that has been previously proposed is 
still not generally sufficient to achieve clinically useful diagnosis of disease. New 
methods to extract useful metabolic information from biofluids are needed. 

The inventors have developed novel methods (which employ multivariate statistical 
15 analysis and pattem recognition (PR) techniques, and optionally data filtering 

techniques) of analysing data (e.g., NMR spectra) from a test population which yield 
accurate mathematical models which may subsequentiy be used to classify a test 
sample or subject, and/or in diagnosis. 

20 Unlike methods previously described, the methods described herein have the power to 
provide clinically useful and accurate diagnostic and prognostic information in a medical 
setting. 

The methods described herein represent a significant advance over chemometric 
25 methodologies described previously. Although chemometrics has been able to provide 
some classification of types previously, the studies have required that the classification 
be done under a series of restrictions which limit tiie ability to apply the method to 
analysis of complex datasets as would be required to apply the method for the practical 
diagnosis/prognosis of diseases tiiat could be useful clinically. 

30 

For example, several studies have reported on the classification of animals on Uie basis 
of an NMR spectrum of urine or plasma. Although these studies cleariy demonstrate the 
potential of the technique, ttiey are limited because the animals which compose each 
dass are genetically homogenous (in43red populations). As a result, these mettiods 
35 have been demonstrated to be able to detect patterns but only against "low noise" 

backgrounds. Application of metabonomlcs to populations (e.g.. in human clinical 



wo 02/086501 



PCT/GB02/01862 



-18- 

pracfice) requires the ability to detect patterns against tlie substantial noise due to the 
genetic variation of out-bred populations and also due to dietary and hormonal 
drffierences. 

5 Similarly, many of the studies described to date have examined relatively major 

differences between groups, for example, the ability to differentiate renally acting toxins 
from liver acting toxins. The two groups under study differed in a broad spectrum of 
metabolites making the pattern relatively easy to detect In conjugation with the 
restriction of using in-bred populations of animals, most studies published to date have 
10 only demonstrated metabonomics to be practicable under conditions of high "signal to 
noise" ratio, conditions which are very different from the human clinical environment. 

Some studies have begun to attempt classifications of out-bred human populations 
where the data variation is high. However, to date, all these studies have simplified the 

15 system substantially to focus in on specific molecules: for example, some studies have 
looked specifically at the resonances associated with lipoproteins. Since lipoproteins are 
major constituents of plasma, the variance they contribute readily exceeds the 
background variance due to genetic and environmental differences between individuals. 
Unfortunately, such an approach is insufficiently powerful to identify weak patterns 

20 against the background biochemical noise, and could not be used, for example, to 
determine the extent of coronary heart disease or to distinguish identical from non- 
identical twins. Identification of such low "signal to noise" ratio patterns requires the 
application of the methods of this invention, which represent a significant advance over 
what has been previously reported. 



25 
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SUMMARY OF THE INVENTION 

One aspect of the present invention pertains to a method of classifying a sample, as 
5 described herein. 

One aspect of the present invention pertains to a method of classifying a subject as 
described herein. 

1 0 One aspect of the present invention pertains to a method of diagnosing a subject as 
described herein. 

One aspect of the present invention pertains to a method of identifying a diagnostic 
spedes, or a combination of a plurality of diagnostic species, for a predetemiined 
15 condition, as described herein. 

One aspect of the present invention pertains to a diagnostic species identified by a 
method as described herein. 

20 One aspect of the present invention pertains to a diagnostic species identified by a 
method as described herein, for use in a method of classification. 

One aspect of the present Invention pertains to a method of classification which employs 
or relies upon one or more diagnostic spedes identified by a method as described herein 

25 

One aspect of the present invention pertains to use of one or more diagnostic species 
identified by a method of classification as described herein. 

One aspect of the present invention pertains to an assay for use in a method of 
30 classification, which assay relies upon one or more diagnostic species identified by a 
method as described herein. 

One aspect of the present invention pertains to use of an assay in a method of 
classification, which assay relies upon one or more diagnostic species identified by a 
35 method as described herein. 
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One aspect of the present invention pertains to a method of therapeutic monitoring of a 
subject undergoing therapy which employs a method of classification as described 
herein. 

5 One aspect of the present invention pertains to a method of evaluating drug therapy 
and/or drug efficacy which employs a method of classification, as described herein. 

One aspect of the present invention pertains to a computer system or device, such as a 
computer or linked computers, operativeiy configured to implement a method as 
10 described herein; and related computer code computer programs, data carriers carrying 
such code and programs, and the like. 

These and other aspects of the present invention are described herein. 



15 



As will be appredated by one of skill in the art, features and prefen-ed embodiments of 
one aspect of the present invention will also pertain to other aspects of the present 
invention. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 AOP is a scores scatter plot for PC2 and PC1 (t2 vs. t1) for the principal 
5 components analysis (PCA) model derived from 1-D NMR spectra from serum 
samples from control subjects (triangles. ▲) and patients with osteoporosis (circles, •). 

Rgure 1B-OP is the corresponding loadings scatter plot (p2 vs. pi) for the PCA shown in 
Figure 1A-OP. 

10 

Rgure 1&-OP is a scores scatter plot for PC2 and PCI (t2 vs. t1) for the PCA model 
derived from 1-D NMR spectra from serum samples from control subjects (triangles, 
A) and patients with osteoporosis (circles, •). Prior to PCA, the data were filtered (in 
this case, using orthogonal signal con-ection, OSC). 

15 

Figure 1 D-OP is the corresponding loadings scatter plot (p2 vs. pi) for the PCA shown in 
Figure 1C-OP. 

Rgure 1 E-OP is a scores scatter plot for PC2 and PCI (t2 vs. t1) for the PLS-DA model 
20 derived from 1-D NMR spectra from serum samples firom control subjects (triangles, 
A) and patients with osteoporosis (circles, •). Prior to PLS-DA, the data were filtered (in 
this case, using orthogonal signal correction, OSC). 

Rgure 1F-OP is the corresponding loadings scatter plot (p2 vs. pi) for the PCA shown in 
25 Rgure 1E-OP. 

Figure 2A-OP shows a section of the variable importance plot (VIP) derived from the 
PLS-DA model described in Figure 1E-OP. 

30 Rgure 2B-OP shows a section of the regression coefficient plot derived from the PLS-DA 
model described in Rgure IE-OP. 

Rgure 3-OP is a y-predicted scatter plot for a PLS-DA model calculated using -85% of 
the control (triangles. A) and osteoporosis (dndes, •) samples, which was then used to 
35 predict the presence of disease in the remaining 15% of samples (squares, ■) (the 
validation set). 
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DETAILED DESCRIPTION OF THE INVENTION 

Introduction 

The inventors have developed novel methods (which employ multivariate statistical 
analysis and pattern recognition (PR) techniques, and optionally data filtering 
techniques) of analysing data (e.g., NMR spectra) from a test population which yield 
accurate mathematical models which may subsequently be used to classify a test 
sample or subject, and/or In diagnosis. 

An NMR spectrum provides a fingerprint or profile for the sample to which it pertains. 
Such spectra represent a measure of all NIWR detectable species present In the sample 
(rather than a select few) and also, to some extent, interactions between these species. 
As such, these spectra are characterised by a high data density which, heretofore, has 
not been fully exploited. 

The methods described herein facilitate the analysis of such spectra, and the 
subsequent use of the results of that analysis to classify test spectra (and therefore the 
associated samples and subjects, If applicable) according to one or more distinguishing 
criteria, at a discrimination level never before achieved. 

These methods find particular application in the field of medidne. For example, analysis 
of NMR spectra for samples taken from a population characterised by a certain condition 
yields a mathematical model which can be used to classify an NMR spectmm for a 
sample from a test subject as positive (also having the condition) or negative (not having 
the condition) with a high degree of confidence. 

In effect, these methods facilitate the identification of the parficular combination of 
amounts of (e.g., endogenous) species which are invariably assodated with the 
presence of the condition. These combinations (patterns), which typically comprise 
many (often small) uncon^elated variances whidi together are diagnostic, are encoded 
within the high data density of the NMR spectra. The methods described herein pemiit 
their Identification and subsequent use for classification. 
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However, it must be stressed that metabonomic analysis based on NMR spectra is much 
more powerful than simply using a high technology analytical tool (the NMR 
spectrometer) to measure the levels of known metabolites. That is, the methods 
described herein are distinct from methods which simply carry out multiple independent 
5 measures of discrete chemical entitities (e.g., LDL cholesterol concentration). 

For example, considering the variance in NMR spectral intensity (total peak intensity) in 
any particular defined chemical shift region (known as a bucket or bin), a part of that 
variance may be associated with a given molecule (a biomari<er), the level of which 
10 varies consistently as a result of the condition under study. The remainder of the 

variance may be due to differences in the levels of other molecules which give peaks in 
that integral region but which are unrelated to the condition under study (e.g., individual 
to individual differences such as dietary factors, age, gender, etc.). 

15 The methods described herein, which employ pattern recognition techniques, pemnit 
identification of that NIWR peak intensity which is related to the condition understudy, 
even though only a small part of the variance in a spectral region (bucket) may be 
related to the condition under study. The identification power is enhanced by the 
application of data filtering techniques (e.g., orthogonal signal correction, OSC) which 

20 can lower the influence of buckets with variance unrelated to the condition of interest. 
Actual identification of the molecular biomarkers contributing to significant buckets is 
earned out by reexamination of the original NMR spectra by NMR experts, and could 
involve additional NMR spectroscopic experiments such as 2-dimensional NMR 
spectroscopy; separation of putative substances and their identification using 

25 HPLC-NMR-MS; addition of authentic substance to the sample and re-measuring the 
NMR spectrum, checking for coincidence of NMR peaks; etc. 

For example, in NMR spectra of blood plasma, in the region around 6 1 .2-1 .3, a number 
of peaks appear, all of which will contribute to the intensity in those buckets labelled 

30 5 1 ,30 (e.g., the chemical shift region 6 1 .32-1 .28), 6 1 .26 (e.g., the region 5 1 .28-1 .24). 
and 5 1.22 (e.g., the region 6 1.24-1.20). Given the bucket width of 0.04 ppm (i.e., 24 Hz 
at 600 MHz), the wings of the lorentzian lines of the NMR resonances will have 
contributions in most or all of these buckets even though the peak maximum appears in 
a sinqle bucket The two main broad NMR peak envelopes in this region of the spectmm 

35 have been assigned to the long chain methylene groups of the fatty acyl chains of 
lipoproteins, and in addition there are a number of small molecule metabolites which 
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have NMR resonances in this region, some of which have been assigned. See, e.g.. 
Nicholson et al, 1995. These include the methyl resonances of lactate (a doublet at 6 
1.33), threonine (a doublet at 6 1.32), fucose (a doublet at 6 1.31), in some cases 
3-hydroxybutyrate (a doublet at 6 1.20) and part of the methylene resonance of 
isoleucine (a multiplet at 6 1.28). The two overtapping lipoprotein peaks have been 
assigned as mainly VLDL at 6 1.29 and mainly LDL at 6 1.25. However both of these 
signals are asymmetric in appearance and are comprised of a number of overtapping 
resonances. By examination of the NMR spectra of individual lipoprotein fractions, it 
has been possible to use mathematical deconvolution techniques to show that this 
composite envelope in the 6 1.3-1.2 region Is comprised of two bands from VLDL, 3 
bands from LDL and 2 bands from HDL See, e.g., M. Ala-Korpela. Progress in NMR 
Spectroscopy. 27, 475-554 (1995)). In fact, the inventors have shown that the variance 
in the spectral intensity in the bucket at 5 1.30 is only weakly correlated with the LDL 
level measured independently for a panel of 100 patients. The conelation coefficient (r) 
between the level of LDL as measured by a convenfional method and the bucket 
intensity at 6 1.30 In the NMR spectra of the same samples, is only 0.45. Therefore, the 
changes in the concentration of LDL over the samples in this panel of 100 pafients only 
accounts for about 20% of the variance in this bucket Intensity, since variance is 
proportional to r^. Thus the variance in the intensity in the 6 1.30 bucket, over the 
sample population, contains much more infonnation than solely the variance in the LDL 
concentration. The methods the present invention pemnitthe determination and 
exploitation of such of the additional, until now hidden, information. 

Furthemiore, the methods can be applied to achieve classification into multiple 
categories on the basis of a single dataset (e.g., an NMR spectaim for a single sample). 
Due to the very high data density of the input dataset, the analysis method can 
separately O.e.. in parallel) or sequentially (i.e., in series) perform multiple classifications. 
For example, a single blood sample could be used to detemiine (e.g., diagnose) the 
presence or absence of several, or indeed, many, (e.g., unrelated) conditions or 
diseases. 

Thus, one aspect of the present Invention pertains to improved methods for the analysis 
of chemical, biochemical, and biological data, for example spectra, for example, nuclear 
magnetic resonance (NMR) and other types of spectra. 
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These techniques have been applied to the analysis of blood serum In the context of 
osteoporosis. For example, the metabonomic analysis can distinguish between 
5 individuals with and without osteoporosis. Novel diagnostic biomarkers for osteoporosis 
have been Identified, and associated methods for diagnosis have been described. 

Methods of Classifvina. DiaanosInQ 

1 0 One aspect of the present invention pertains to a method of classifying a sample, as 
described herein. 

One aspect of the present invention pertains to a method of classifying a subject by 
classifying a sample from said subject, wherein said method of classifying a sample is as 
1 5 described herein. 

One aspect of the present invention pertains to a method of diagnosing a subject by 
classifying a sample from said subject; wherein said method of classifying a sample is as 
described herein. 

20 

Classifying a Sample: Bv NMR Soectral Intensitv 

One aspect of the present invention pertains to a method of classifying a sample, said 
method comprising the step of relating NMR spectral intensity at one or more 
25 predetermined diagnostic spectrBl windows for said sample with a predetennined 
condition. 

One aspect of the present invention pertains to a method of classifying a sample from a 
subject, said method comprising the step of relating NMR spectral Intensity at one or 
30 more predetennined diagnostic spectral windows for said sample with a predetennined 
condition of said subject 

One aspect of the present invention pertains to a method of classifying a sample, said 
method comprising the step of relating NMR spectra! intensity at one or more 
35 predetermined diagnostic spectral windows for said sample with the presence or 
absence of a predetennined condition. 
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One aspect of the present invention pertains to a method of dasslfying a sample from a 
subject, said method comprising the step of relating NMR spectral intensity at one or 
more predetennined diagnostic spectral windows for said sample with the presence or 
5 at>sence of a predetennined condition of said subject 

One aspect of the present invention pertains to a method of classifying a sample, said 
method comprising the step of relating a modulation of HMR spectral Intensity, relative to 
a control value, at one or morp predetermined diagnostic spectral windows for said 
1 0 sample with a predetenmined condition. 

One aspect of the present invention pertains to a method of classifying a sample from a 
subject, said method comprising the step of relating a modulation of NMR spectral 
intensity, relative to a control value, at one or more predetermined diagnostic spectral 
15 windows for said sample with a predetermined condition of said subject. 

One aspect of the present invention pertains to a method of classifying a sample, said 
method comprising the step of relating a modulation of NMR spectral intensity, rBlative to 
a control value, at one or more predetermined diagnostic spectral windows for said 
20 sample with the presence or absence of a predetennined condition. 

One aspect of the present invention pertains to a method of classifying a sample from a 
subject, said method comprising the step of relating a modulation of NMR spectral 
Intensity, relative to a control value, at one or more predetermined diagnostic spectral 
25 windows for sakJ sample with the presence or absence of a predetennined condition of 
said subject. 

Classifvino a Subject: Bv NMR Spectral IntensihA 

30 One aspect of the present invention pertains to a method of classifying a subject, said 
method comprising the step of relating NMR spectral intensity at one or more 
predetermined diagnostic spectral windows for a sample from said subject with a 
predetermined condition of said subject. 

35 One aspect of the present invention pertains to a method of classifying a subject, said 
method comprising the step of relating NMR spectral intensity at one or more 
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predetermined diagnostic spectral windows for a sample from said subject with the 
presence or absence of a predetermined condition of said subject. 

One aspect of the present invention pertains to a method of classifying a subject, said 
5 method comprising the step of relating a modulation of NMR spectral intensity, relative to 
a control value, at one or more predetermined diagnostic spectral windows for a sample 
from said subject with a predetermined condition of said subject. 

One aspect of the present invention pertains to a method of classifying a subject, said 
10 method comprising the step of relating a modulation of NMR spectral intensity, relative to 
a control value, at one or more predetemnined diagnostic spectral windows for a sample 
from said subject with the presence or absence of a predetermined condition of said 
subject. 

15 Diagnosing a Subject: Bv NMR Spectral Intensity 

One aspect of the present invention pertains to a method of diagnosing a predetermined 
condition of a subject, said method comprising the step of relating NMR spectral intensity 
at one or more predetermined diagnostic spectral windows for a sample from said 
20 subject with said predetermined condition of said subject. 

One aspect of the present invention pertains to a method of diagnosing a predetenmined 
condition of a subject, said method comprising tiie step of relating NMR spectral intensity 
at one or more predetermined diagnostic spectral windows for a sample from said 
25 subject witti tiie presence or absence of said predetermined condition of said subject. 

One aspect of ttie present invention pertains to a method of diagnosing a predetermined 
condition of a subject, said method comprising the step of relating a modulation of NMR 
spectral intensity, relative to a control value, at one or more predetermined diagnostic 
30 spectral windows for a sample from said subject with said predetemiined condition of 
said subject. 

One aspect of Uie present invention pertains to a method of diagnosing a predetennined 
condition of a sulqect, said method comprising the step of relating a modulation of NMR 
35 specbiil intensity, relative to a control value, at one or more predetemiined diagnostic 
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spectral windows for a sample from said subject with the presence or atisence of said 
predetermined condition of said subject. 

Classifying a Sample: By Amount of Diagnostic Species 

One aspect of the present Invention pertains to a method of classifying a sample, said 
method comprising the step of relating the amount of. or relath^e amount of one or more 
diagnostic species present in said sample with a predetermined condition. 

One aspect of the present invention pertains to a method of dassifying a sample from a 
subject, said method comprising the step of relating Oie amount of, or relative amount of 
one or more diagnostic species present in said sample with a predetermined condition of 
said subject. 

One aspect of the present invention pertains to a method of classifying a sample, said 
method comprising the step of relating the amount of, or relative amount of one or more 
diagnostic species present in said sample with the presence or absence of a 
predetemiined condition. 

One aspect of the present invention pertains to a meUiod of classifying a sample from a 
subject, said metiiod comprising the step of relating ttie amount of, or the relative 
amount of, one or more diagnostic species present in said sample widi the presence or 
absence of a predetemiined condition of said subject 

One aspect of tiie present invention pertains to a method of classifying a sample, said 
method comprising the step of relating a modulation of «ie amount of. or relative amount 
of one or more diagnostic species present in said sample, as compared to a control 
sample, with a predetemiined condition. 

One aspect of the present invention pertains to a metiiod of classifying a sample from a 
subject, said method comprising the step of relating a modulation of the amount of, or 
relative amount of one or more diagnostic species present In said sample, as compared 
to a control sample, with a predetermined condition of said subject. 

One aspect of the present invention pertains to a metiiod of classifying a sample, said 
method comprising ttie step of relating a modulation of the amount of, or relative amount 
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of one or more diagnostic species present in said sample, as compared to a control 
sample, with tlie presence or absence of a predetermined condition. 

One aspect of the present invention pertains to a method of classifying a sample from a 
5 subject, said method comprising the step of relating a modulation of the amount of, or 
relative amount of one or more diagnostic species present in said sample, as compared 
to a control sample, with the presence or absence of a predetermined condition of said 
subject. 

10 ClassifvinQ a Subject By Amount of Diagnostic Soecles 

One aspect of the present invention pertains to a method of classifying a subject, said 
method comprising the step of relating the amount of, or relative amount of one or more 
diagnostic species present in a sample finom said subject with a predetermined condition 
15 of said subject. 

One aspect of the present invention pertains to a method of classifying a subject, said 
method comprising the step of relating the amount of, or relative amount of one or more 
diagnostic species present in a sample from said subject with the presence or absence 
20 of a predetennined condition of said subject. 

One aspect of the present invention pertains to a method of classifying a subject, said 
method cornprising the step of relating a modulation of the amount of, or relative amount 
of one or more diagnostic species present in a sample from said subject, as compared to 
25 a control sample, with a predetennined condition of said subject. 

One aspect of the present invention pertains to a method of dassi^ng a subject, said 
method comprising the step of relating a modulation of the amount of. or relative amount 
of one or more diagnostic species present in a sample from said subject, as compared to 
30 a control sample, with the presence or absence of a predetermined condition of said 
subject. 

Diagnosing a Subiecfc Bv Amount of Diagnostic Species 



35 



One aspect of the present invention pertains to a method of diagnosing a predetennined 
condition of a subject, said method comprising the step of relating the amount of, or 
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relative amount of one or more diagnostic species present in a sample from said subject 
with said predetermined condition of said subject. 

One aspect of the present invention pertains to a method of diagnosing a predetennined 
condition of a subject, said method comprising the step of relating the amount of. or 
relative amount of one or more diagnostic spedes present in a sample from said subject 
with the presence or absence of said predetermined condition of said subject. 

One aspect of the present invention pertains to a method of diagnosing a predetennined 
condition of a subject, said method comprising the step of relating a modulation of the 
amount of, or relative amount of one or more diagnostic species present in a sample 
from said subject, as compared to a control sample, with said predetennined condition of 
said subject. 

One aspect of the present invention pertains to a method of diagnosing a predetennined 
condition of a subject, said method comprising the step of relating a modulation of the 
amount of, or relative amount of one or more diagnostic species present in a sample 
from said subject, as compared to a control sample, with the presence or absence of 
said predetermined condition of said subject 

ClassaVina a Sample: Bv Mafliematfcal Modelling 

One aspect of the present invention pertains to a method of classification, said method 
comprising the steps of: 

(a) fbmiing a predictive matiiematical model by applying a modelling metiiod to 
modelling data; 

(b) using said model to classify a test sample. 

One aspect of the present invention pertains to a mettiod of classifying a test sample, 
said method comprising the steps oft 

(a) forming a predictive matiiematical model by applying a modelling metiiod to 
modelling data; 

wherein said modelling data comprises a plurality of data sets for modelling 
samples of known dass; 

(b) using said model to classify said test sample as being a member of one of 
said known classes. 
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One aspect of the present invention pertains to a method of classifying a test sample, 
said method comprising the steps of: 

(a) forming a predictive mathematical model by applying a modelling method to 
modelling data; 

wherein said modelling data comprises at least one data set for each of a plurality 
of modelling samples; 

wherein said modelling samples define a class group consisting of a plurality of 
classes; 

wherein each of said modelling samples is of a known class selected from said 
class group; and, 

(b) using said model with a data set for said test sample to classify said test 
sample as being a member of one class selected from said class group. 

1 5 One aspect of the present invention pertains to a method of classification, said method 
comprising the step of: 

using a predictive mathematical model; 

wherein said model is fonned by applying a modelling method to modelling data; 
to classify a test sample. 

20 

One aspect of the present invention pertains to a method of classifying a test sample, 
said method comprising the step of: 

using a predictive mathematical model; 

wherein said model Is fomied by applying a modelling method to modelling data; 
25 wherein said modelling data comprises a plurality of data sets for modelling 

samples of known dass; 

to classify said test sample as being a member of one of said known classes. 

One aspect of the present invention pertains to a method of classifying a test sample, 
30 said method comprising the step of: 

using a predictive mathematical model; 

wherein said model Is fonned by applying a modelling method to modelling data; 
wherein said modelling data comprises at least one data set for each of a plurality 
of modelling samples; 

35 wherein said modelling samples define a class group consisting of a plurality of 

classes; 



5 
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wherein each of said modelling samples is of a known class selected from said 
class group; 

with a data set for said test sample to classify said test sample as being a 
member of one class selected from said class group. 



Classtfvrna a Subject' Bv Mathematical Modelling 



One aspect of the present invention pertains to a method of classification, said method 
comprising the steps of: 

(a) fonning a predictive mathematical model by applying a modelling method to 
modelling data; 

(b) using said model to classify a subject. 

One aspect of the present invention pertains to a method of classifying a subject, said 
method comprising the steps of: 

(a) fomiing a predictive mathematical model by applying a modelling method to 
modelling data; 

wherein said modelling data comprises a plurality of data sets for modelling 
samples of known dass; 

(b) using said model to classify a test sample from said subject as being a 
member of one of said known classes, and thereby classify said subject. 

One aspect of the present invention pertains to a method of classif^ng a subject, said 
method comprising the steps of: 

(a) fonning a predictive mathematical model by applying a modelling method to 
modelling data; 

wherein said modelling data comprises at least one data set for each of a plurality 
of modelling samples; 

wherein said modelling samples define a dass group consisting of a plurality of 
classes; 

wherein each of said modelling samples is of a known dass selected firom said 
class group; and, 

(b) using said model with a data set for a test sample from said subject to dassify 
said test sample as being a member of one dass seleded from said class group, and 
thereby dassKy said subject 
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One aspect of the present invention pertains to a metliod of classification, said method 
comprising the step of: 

using a predictive mathematical model; 

wherein said model is fomied by applying a modelling method to modelling data; 
5 to classify a subject. 

One aspect of the present invention pertains to a method of classifying a subject, said 
method comprising the step of: 

using a predictive mathematical model 
10 wherein said model is formed by applying a modelling method to modelling data; 

wherein said modelling data comprises a plurality of data sets for modelling 
samples of known class; 

to classify a test sample from said subject as being a member of one of said 
known classes, and thereby classify said subject. 

15 

One aspect of the present Invention pertains to a method of classifying a subject, said 
method comprising the step of: 

using a predictive mathematical model, . 

wherein said model is formed by applying a modelling method to modelling data; 
20 wherein said modelling data comprises at least one data set for each of a plurality 

of modelling samples; 

wherein said modelling samples define a class group consisting of a plurality of 



wherein each of said modelling samples is of a known class selected from said 
25 dass group; 

with a data set for a test sample from said subject to classify said test sample as 
being a member of one class selected from said class group, and thereby classify said 
subject. 

30 Diaqnosino a Subiect: Bv Mathematical ModeliInQ 

One aspect of the present invention pertains to a method of diagnosis, said method 
comprising the steps of: 

(a) forming a predictive mathematical model by applying a modelling method to 
35 modelling data; 

(b) using said model to diagnose a subject. 
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One aspect of the present invention pertains to a method of diagnosing a predetennined 
condition of a subject, said method comprising the steps of: 

(a) fonning a predictive mathematical model by applying a modelling method to 
5 modelling data; 

wherein said modelling data comprises a plurality of data sets for modelling 
samples of known dass; 

(b) using said model to classify a test sample from said subject as being a 
member of one of said known classes, and thereby diagnose said subject 

10 

One aspect of the present invention pertains to a method of diagnosing a predetermined 
condition of a subject, said method comprising the steps of: 

(a) forming a predictive mathematical model by applying a modelling method to 
modelling data; 

15 wherein said modelling data comprises at least one data set for each of a plurality 

of modelling samples; 

wherein said modelling samples define a class group consisting of a plurality of 
classes; 

wherein each of said modelling samples is of a known class selected from said 
20 class group; and, 

(b) using said model with a data set for a test sample from said subject to classify 
said test sample as being a member of one dass selected from said dass group, and 
thereby diagnose said subject 

25 One aspect of the present invention pertains to a method of diagnosis, said method 
comprising the step of: 

using a predictive mathematical model; 

wherein said model is fonned by applying a modelling method to modelling data; 
to diagnose a subjed. 

30 

One aspect of the present invention pertains to a method of diagnosing a predetermined 
conditton of a subject, said method comprising the step of: 
using a predictive mathematical model; 

wherein said model is fonned by applying a modelling method to modelling data; 
35 wherein said modelling data comprises a plurality of data sets for modelling 

samples of known dass; 
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to classify a test sample from said subject as being a member of one of said 
known classes, and thereby diagnose said subject. 

One aspect of tlie present invention pertains to a mettiod of diagnosing a predetermined 
5 condition of a subject, said method comprising the step of: 
using a predictive mathematical model; 

wherein said model is fornied by applying a modelling method to modelling data; 
wherein said modelling data comprises at least one data set for each of a plurality 
of modelling samples; 

10 wherein said modelling samples define a class group consisting of a plurality of 

classes; 

wherein each of said modelling samples is of a known dass selected from said 
class group; 

with a data set for a test sample from said subject to classify said test sample as 
1 5 being a member of one class selected from said class group, and thereby diagnose said 
subject. 

Certain Preferred Embodiments 

20 In one embodiment, said sample is a sample fifom a subject, and said predetermined 
condition is a predetennined condition of said subject. 

In one embodiment, said test sample is a test sample from a subject, and said 
predetennined oonditron is a predetermined condition of said subject 

25 

In one embodiment, said one or more predetermined diagnostic spectral windows are 
associated with one or more diagnostic species. 

In one embodiment, said relating step involves the use of a predictive mathematical 
30 model; for example, as described herein. 

The nature of a predictive mathematical model is detennined primarily by the modelling 
method employed when forming that model. 

35 In one embodiment, said modelling method is a multivariate statistical analysis modelling 
method. 
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In one embodiment, said modelling method is a multivariate statistical analysis modelling 
method which employs a pattern recognition method. 

5 In one embodiment, said modelling method is. or employs PCA. 

In one embodiment, said modelling method is, or employs PLS. 

In one embodiment, said modelling method Is, or employs PLS-DA. 

In one embodiment, said modelling method includes a step of data filtering. 

In one embodiment, said modelling method includes a step of orthogonal data filtering. 
15 In one embodiment, said modelling method includes a step of OSC. 

In one embodiment, said model takes account of one or mofB diagnostic species. 

The precise details of the predictive mathematical model are detenmined primarily by the 
modelling data (e.g., modelling data sets). 

In one embodiment, said modelling data comprise spectral data. 

In one embodiment, said modelling data comprise both spectral data and non-spectral 
data (and is referred to as a "composite data"). 

In one embodiment, said modelling data comprise NMR spectral data. 

In one embodiment, said modelling data comprise both NMR spectral data and non-NMR 
spectral data. 

In one embodiment, said NMR spectral data comprises NMR spectral data and/or ^^C 
NMR spectral data. 

In one embodiment, said NMR spectral data comprises NMR spectral data. 
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In one embodiment, said modelling data are spectra. 

5 In one embodiment said modelling data comprises a plurality of data sets for modelling 
samples of known class. 

In one embodiment, said modelling data comprises at least one data set for each of a 
plurality of modelling samples. 

10 

In one embodiment, said modelling data comprises exactly one data set for each of a 
plurality of modelling samples. 

In one embodiment, said using step is: using said model with a data set for said test 
1 5 sannple to classify said test samite as being a member of one dass selected from said 
class group. 

In one embodiment, each of said data sets comprises spectral data. 

20 In one embodiment, each of said data sets comprises both spectral data and non- 
spectral data (and is referred to as a "composite data set"). 

In one embodiment, each of said data sets comprises NMR spectral data. 

25 In one embodiment, each of said data sets comprises both NMR spectral data and non- 
NMR spectral data. 

In one embodiment, said NMR spectral data comprises NMR spectral data and^or ^^C 
NMR spectral data. 

30 

In one embodiment, said NMR spectral data comprises NMR spectral data. 

In one embodiment, each of said data sets comprises a spectrum. 

35 In one embodiment, each of said data sets comprises a NMR spectrum and/or 
"C NMR spectrum. 
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In one embodiment, each of said data sets comprises a MMR spectrum. 

In one embodiment, each of said data sets is a spectrum. 

In one embodiment, each of said data sets is a NMR spedmm and/or ^^C Nft/IR 
spectrum. 

In one embodiment each of said data sets is a NMR spectarm. 

In one embodiment, said non-spectral data is non-spectral clinical data 

In one embodiment, said non-NMR spectral data is non-spectral clinical data. 

In one embodiment, said class group comprises classes associated with said 
predetermined condition (e.g., presence, absence, degree, etc.). 

In one embodiment, ssdd class group comprises exactly two classes. 

In one embodiment, said class group comprises exactly two classes: presence of said 
predetennined condition; and absence of said predetennined condition. 

Classificat ion. ClassifYina. and Classes 

As discussed above, many aspects of the present invention pertain to methods of 
classifying things, for example, a sample, a subject, etc. In such methods, the thing is 
classified, that Is. It Is associated with an outcome, or, more specifically, it is assigned 
membership to a particular class (i.e., it is assigned class membership), and is said 'to 
be of," to belong to," "to be a member of," a particular class. 

Classification Is made (i.e., dass membership is assigned) on the basis of diagnostic 
CTiteria. The step of considering such diagnostic criteria, and assigning class 
membership, is described by the word "relating." for example, in the phrase "relating 
NMR spectral Intensity at one or more predetermined diagnostic spectral windows for 
said sample (l.e.. diagnostic criteria) with the presence or absence of a predetennined 
condition (i.e., class membership)." 
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For example, "presence of a predetermined condition" is one class, and "absence of a 
predetemiined condition" is another class; in sucli cases, classification (i.e., assignment 
to one of these classes) is equivalent to diagnosis. 

5 

Samples 

As discussed above, many aspects of the present invention pertain to methods which 
involve a sample, e.g., a particular sample understudy ("study sample"). 

10 

In general, a sample may be in any suitable form. For methods which involve spectra 
obtained or recorded for a sample, the sample may be in any fomfi which is compatible 
with the particular type of spectroscopy, and therefore may be, as appropriate, 
homogeneous or heterogeneous, comprising one or a combination of, for example, a 
15 gas, a liquid, a liquid crystal, a gel, and a solid. 

Samples which originate from an organism (e.g., subject, patient) may be in vivo; that is, 
not removed from or separated from the organism. Thus, in one embodiment said 
sample is an in vivo sample. For example, the sample may be circulating blood, which is 
20 "probed" in situ, in vivo, for example, using NMR methods. 

Samples which originate from an onganism may be ex vivo; that is, removed from or 
separated from the organism (e.g., an ex vivo blood sample, an ex vivo urine sample). 
Thus, in one embodiment, said sample is an ex vivo sample. 

25 

In one embodiment, said sample is an ex vivo blood or blood-derived sample. 
In one embodiment, said sample is an ex vivo blood sample. 
In one embodiment, said sample is an ex vivo plasma sample. 
In one embodiment, said sample is an ex vivo serum sample. 
30 In one embodiment, said sample is an ex vivo urine sample. 

In one embodiment, said sample is removed from or separated from an/said organism, 
and is not returned to said organism (e.g., an ex vivo blood sample, an ex vivo urine 
sample). 

35 
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In one embodiment, said sample is removed from or separated from an/said organism, 
and is returned to said organism (I.e., "in transit") (e.g., as with dialysis methods). Thus, 
in one embodiment, said sample is an ex vivo in transit sample. 

Examples of samples include: 

a whole organism (living or dead. e.g.. a living human); 

a part or parts of an organism (e.g., a tissue sample, an organ); 

a pathological tissue such as a tumour; 

a tissue homogenate (e.g. a liver microsome fraction); 

an extract prepared from a organism or a part of an origamsm (e.g., a tissue 
sample extract, such as perchloric acid extract); 

an infusion prepared from a organism or a part of an organism (e.g., tea, Chinese 
traditional herbal medicines); 

an in vitro tissue such as a spheroid; 

a suspension of a particular ceil type (e.g. hepatocytes); 

an excretion, secretion, or emission from an organism (especially a fluid); 

material which is administered and collected (e.g., dialysis fluid); 

material which develops as a function of pathology (e.g., a cyst, blisters); and. 

supernatant from a cell culture. 

Examples of fluid samples include, for example, blood plasma, blood serum, whole 
blood, urine, (gall bladder) bile, cerebrospinal fluid, milk, saliva, mucus, sweat, gastric 
juice, pancreatic juice, seminal fluid, prostatic fluid, seminal vesicle fluid, seminal plasma, 
amniotic fluid, foetal fluid, follicular fluid, synovial fluid, aqueous humour, asdte fluid, 
cystic fluid, blister fluid, and cell suspensions; and extracts thereof. 

Examples of tissue samples include liver, kidney, prostate, brain, gut, blood, blood cells, 
skeletal muscle, heart muscle, lymphoid, bone, cartilage, and reproductive tissues. 

Still other examples of samples include air (e.g,, exhaust), water (e.g.. seawater, 
groundwater, wastewater, e.g., firom factories), liquids firom the food Industry (e.g. juices, 
wines, beers, other alcoholic drinks, tea, milk), solkMIke food samples (e.g. chocolate, 
pastes, fruit peel, fruit and vegetable flesh such as banana, leaves, meats, whether 
cooked or raw, etc.). 

A few preferred samples are discussed below. 
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Blood. Plasma. Serum 

Blood is the fluid that circulates in the blood vessels of the body, that Is, the fluid that is 
5 circulated through the heart, arteries, veins, and capillaries. The function of the blood 
and the circulation is to service the needs of other tissues: to transport oxygen and 
nutrients to the tissues, to transport cari3on dioxide and various metabolic waste 
products away, to conduct homnones from one part of the body to another, and in 
general to maintain an appropriate environment in all tissue fluids for optimal survival 
1 0 and function of the cells. 

Blood consists of a liquid component, plasma, and a solid component, cells and formed 
elements (e.g.. erythrocytes, leukocytes, and platelets), suspended within it. 
Erythrocytes, or red blood cells account for about 99.9% of the cells suspended in 
human blood. They contain hemoglobin which is involved in the transport of oxygen and 

1 5 carbon dioxide. Leukocytes, or white blood cells, account for about 0, 1 % of the cells 
suspended in human blood. They play a role in the body's defense mechanism and 
repair mechanism, and may be classified as agranular or granular. Agranular leukocytes 
include monocytes and small, medium and large lymphocytes, with small lymphocytes 
accounting for about 20-25% of the leukocytes in human blood. T cells and B cells are 

20 important examples of lymphocytes. Three classes of granular leukocytes are known, 
neutrophils, eosinophils, and basophils, with neutrophils accounting for about 60% of the 
leukocytes in human blood. Platelets (/.a, thrombocytes) are not cells but small spindle- 
shaped or rodlike bodies about 3 microns in length which occur in large numbers In 
circulating blood. Platelets play a major role in clot formation. 

25 Plasma is the liquid component of blood. It senses as the primary medium for the 
transport of materials among cellular, tissue, and organ systems and their various 
external environments, and it is essential for the maintenance of normal hemostasis. 
One of the most important functions of many of the major tissue and organ systems is to 
maintain specific components of plasma within acceptable physiological limits. 

30 Plasma is the residual fluid of blood which remains after removal of suspended cells and 
fbnned elements. Whole blood is typically processed to removed suspended cells and 
fomied elements (e.g., by centrifugation) to yield blood plasma. Serum is the fluid which 
is obtained after blood has been allowed to ctot and the dot removed. Blood serum may 
be obtained by fomning a blood dot (e.g., optionally initiated by the addition of thrombin 
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and calcium ion) and subsequently removing the clot (e.g., by centrifugation). Serum 
and plasma differ primarily in their content of fibrinogen and several components which 
are removed in the clotting process. Plasma may be effectively prevented from clotting 
by the addition of an anti-coagulant (e.g., sodium citrate, heparin, lithium heparin) to 
5 permit handling or storage. Plasma is composed primarily of water (approximately 90%), 
with approximately 7% proteins, 0.9% Inoipanic salts, and smaller amounts of 
cari^ohydrates, lipids, and organic salts. 

The temn ^lood sample.** as used herein, pertains to a sample of whole blood. 

10 

The term "blood-derived sample," as used herein, pertains to an ex vivo sample derived 
from the blood of the subject under study. 

Examples of blood and blood-derived samples include, but are not limited to, whole 
15 blood (WB), blood plasma (including, e.g., fresh frozen plasma (FFP)), blood serum, 
blood fractions, plasma fractions, serum fractions, blood fradions comprising red blood 
ceils (RBC), platelets (PLT), leukocytes, eta, and cell lysates including fractions thereof 
(for example, cells, such as red blood cells, white blood cells, etc., may l>e harvested and 
lysed to obtain a cell tysate). 

20 

Methods for obtaining, preparing, handling, and storing blood and blood-derived samples 
(e.g., plasma, serum) are well known in the art. Typically, blood is collected from 
subjects using conventional techniques (e.g., from the ante-cubital fossa), typically pre- 
prandially. 

25 

For use in the methods described herein, the method used to prepare the blood fraction 
(e.g., serum) should be reproduced as carefully as possible from one subject to the next, 
tt is important that the same or similar procedure be used for all subjects. It may be 
preferable to prepare serum (as opposed to plasma or other blood fractions) for two 
30 reasons: (a) the preparation of serum is more reproducible from individual to individual 
than the preparation of plasma, and (b) the preparation of plasma requires the addition of 
anticoagulants (e.g., EDTA, citrate, or heparin) which will be visible in the NMR 
metat)onomic profile and may reduce the data density available. 

35 A typical method for the preparation of serum suitable for analysis by the methods 

described herein is as follows: 10 mL of blood is drawn from the antecubital fossa of an 
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individual who had fasted overnight, using an 18 gauge butterfly needle. The blood Is 
immediately dispensed into a polypropylene tube and allowed to clot at room 
temperature for 3 hours. The clotted blood is then subjected to centrifugation (e.g., 
4,500 X g for 5 minutes) and the serum supernatant removed to a clean tube. If 
5 necessary, the centrifugation step can be repeated to ensure the serum is efficiently 
separated from the clot. The serum supernatant may be analysed 'Yresh" or it may be 
stored frozen for later analysis. 

A typical method for the preparation of plasma suitable for analysis by the methods 
10 described herein is as follows: High quality platelet-poor plasma is made by drawing the 
blood using a 19 gauge butterfly needle without the use of a tourniquet from the 
anetcubttal fossa. The first 2 mL of blood drawn is discarded and the remainder is 
rapidly mixed and aliquoted into Diatube H anticoagulant tubes (Becton Dickinson). After 
gentle mixing by inversion the anticoagulated blood is cooled on ice for 15 minutes then 
15 subjected to centrifugation to pellet the cells and platelets (appro)dmately 1 ,200 x g for 
15 minutes). The platelet poor plasma supemantant is carefully removed, drawing off 
the middle third of the supematant and discarding the upper thircl (which may contain 
floating platelets) and the lower third which is too close to the readily disturbed platelet 
layer on the top of the cell pellet. The plasma may then be aliquoted and stored frozen 
20 at -20*'C or colder, and then thawed when required for assay. 

Samples may be analysed immediately CYiresh"), or may be frozen and stored (e.g.. at - 
BO^'C) Cfresh frozen") for future analysis, if frozen, samples are completely thawed prior 
to NMR analysis. 

25 

In one embodiment, said sample is a blood sample or a blood-derived sample. 

In one embodiment, said sample is a blood sample. 

In one embodiment, said sample is a blood plasma sample. 

In one embodiment, said sample is a blood serum sample. 

30 

Urine 

The composition of urine is complex and highly variable both between species and within 
species according to lifestyle. A wide range of organic acids and bases, simple sugars 
35 and polysaccharides, heterocydes, polyols. low molecular weight proteins and 
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polypeptides are present together with inorganic species sucli as Na*, K*, Ca^*, Mg^, 
HCOa", SO/' and phosphates. 

The term "urine," as used herein, pertains to whole (or intact) urine, whether in vivo (e.g.. 
foetal urine) or ex vivo, e.g.. by excretion or catheterisation. 

The term "urine-derived sample." as used herBln, pertains to an ex vivo sample derived 
from the urine of the subject under study (e.g., obtained by dilution, concentration, 
addition of additives, solvent- or solid-phase extraction, etc.). Analysis may be 
perfomried using, for example, fresh urine; urine which has been frozen and then thawed; 
urine which has been dried (e.g.. freeze-dried) and then reconstituted, e.g.. with water or 
D2O. 

Methods for the collection, handling, storage, and pre-analysis preparation of many 
classes of sample, especially biological samples (e.g., biofluids) are well known in the 
art See. for example, Linden et al., 1999. 

In one embodiment, said sample Is a urine sample or a urine-derived sample. 
In one embodiment, said sample is a urine sample: 

Oroanisms. Subjects. Patients 

As discussed above, in many cases, samples are, or originate from, or are drawn or 
derived from, an organism (e.g.. subject, patient). In such cases, the organism may be 
as defined below. 

In one embodiment, the organism is a prokaryote (e.g., bacteria) or a eukaryote (e.g., 
protoctlsta, fiingi, plants, animals). 

In one embodiment, the organism is a prokaryote (e.g., bacteria) or a eukaryote 
(e g-. protocHsta, fungi, plants, animals). 



In one embodiment, the organism is a protoctista, an alga, or a protozoan. 
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In one embodiment, the organism is a plant, an angiospemi, a dicotyledon, a 
monocotyledon, a gymnospemi, a conifer, a ginkgo, a cycad, a fern, a horsetail, a 
clubmoss, a iivenvort, or a moss. 

5 In one embodiment, the organism is an animal. 

In one embodiment, the organism is a chordate, an invertebrate, an echinoderm (e.g., 
starfish, sea urchins, brittlestars), an arthropod, an annelid (segmented worms) 
(e.g., earthworms, lugworms. leeches), a mollusk (cephalopods (e.g., squids, octopi), 
10 pelecypods (e.g., oysters, mussels, clams), gastropods (e.g., snails, slugs)), a nematode 
(round worms), a platyhelminthes (flatwomis) (e.g., planarians, flukes, tapewomis), a 
cnldaria (e.g., jelly fish, sea anemones, corals), or a porifera (e.g., sponges). 

In one embodiment, the organism is an arthropod, an insect (e.g., beetles, butterflies, 
15 moths), a chilopoda (centipedes), a diplopoda (millipedes), a crustacean (e.g., shrimps, 
crabs, lobsters), or an arachnid (e.g., spiders, scorpions, mites). 

In one embodiment, the organism is a chordate, a vertebrate, a mammal, a bird, a reptile 
(e.g., snakes, lizards, crocodiles), an amphibian (e.g., frogs, toads), a bony fish (e.g., 
20 salmon, plaice, eel, lungfish). a cartilaginous fish (e.g., sharks, rays), or a jawless fish 
(e.g., lampreys, hagfish). 

In one embodiment, the organism (e.g., subject, patient) Is a mammal. 

25 In one embodiment the organism (e.g.. subject, patient) Is a placental mammal, 

a marsupial (e.g., kangaroo, wombat), a monotreme (e.g., duckbilled platypus), a rodent 
(e.g., a guinea pig, a hamster, a rat, a mouse), murine (e.g., a mouse), a lagomorph 
(e.g., a rabbit), avian (e.g., a bird), canine (e.g., a dog), feline (e.g., a cat), equine (e.g., a 
horse), porcine (e.g., a pig), ovine (e.g., a sheep), bovine (e.g., a cow), a primate, simian 

30 (e.g., a monkey or ape), a monkey (e.g., marmoset, baboon), an ape (e.g., gorilla, 
chimpanzee, orangutang, gibbon), or a human. 

Furthermore, the organism may be any of its forms of development, for example, a 
spore, a seed, an egg, a larva, a pupa, or a foetus. 

35 

In one embodiment, the organism (e.g.i subject, patient) is a human. 
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The subject (e.g., a human) may be characterised by one or more criteria, for example, 
sex, age (e.g., 40 years or more, 50 years or more, 60 years or more, etc.), ethnicity, 
medical history, lifestyle (e.g., smoker, non-smoker), honmonal status (e.g., pre- 
menopausal, post-menopausal), etc. 

The temn "population," as used herein, refere to a group of organisms (e.g.. subjects, 
patients). If desired, a population (e.g., of humans) may be selected according to one or 
more of the criteria listed above. 

Conditions 

As discussed above, many methods of the present invention involve assigning class 
membership, for example, to one of one or more classes, for example, to one of the two 
classes: (i) presence of a predetermined condition, or (ii) absence of a predetermined 
condition. 

A condition is "predetennined" in the sense that it is the condition in respect to which the 
invention is practised; a condition is predetennined by a step of selecting a condition for 
considering, study, etc. 

As used herein, the tenn "conditio^' relates to a state which is, in at least one respect, 
distinct from the state of normality, as detemiined by a suitable control population. 

A condition may be pathological (e.g.. a disease) or physiological (e.g.. phenotype, 
genotype, fasting, water toad, exercise, honnonal cycles, e.g., oestrus, etc.). 

Included among conditions is the state of "at risk of a condition, "predisposition towards 
a" condition, and the like, again as compared to the state of nonnality. as determined by 
a suitable control population. In this way, osteoporosis, at risk of osteoporosis, and 
predisposition towards osteoporosis are all conditk>ns (and are also conditions 
assoaated with osteoporosis). 

Where the condition is the state of "at risk of," "predisposition towanJs," and the like, a 
method of diagnosis may be considered to be a method of prognosis. 
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In this context, the phrases "at risk of." "predisposition towards," and the like, indicate a 
probability of being classified/diagnosed (or being able to be classified/diagnosed) with 
the predetermined condition which is greater (e.g., 1.5x, 2x, 5x, 10x, etc.) than for the 
corresponding control. Often, a time period (e.g., within the next 5 years, 10 years, 20 
5 years, etc.) is associated with the probability. For example, a subject who is 2x more 
likely to be diagnosed with the predetermined condition within the next 5 years, as 
compared to a suitable control, is "at risk of that condition. 

Included among conditions is the degree of a condition, for example, the progress or 
1 0 phase of a disease, or a recovery therefrom. For example, each of different states in the 
progress of a disease, or in the recovery from a disease, are themselves conditions, in 
this way, the degree of a condition may refer to how temporally advanced the condition 
is. Another example of a degree of a condition relates to its maximum severity, e.g., a 
disease can be classified as mild, moderate or severe). Yet another example of a 
15 degree of a condition relates to the nature of the condition (e.g., anatomical site, extent 
of tissue involvement, etc.). 

Predetermined Condition 

20 In the present invention, said predetemnined condition is associated a bone disorder, 
e.g., a condition associated with low bone mineral density, e.g., osteoporosis. 

Functions of Bone 

25 The function of bone is to provide mechanical support for Joints, tendons and ligaments, 
to protect vital organs from damage and to act as a reservoir for calcium and phosphate 
in the preservation of normal mineral homeostasis. Diseases of bone compromise these 
functions, leading to clinical problems such as fracture, bone pain, bone deformity and 
abnormalities of calcium and phosphate homeostasis. 

30 

Tvpes of Bone 

The nomnal skeleton contains two types of bone; cortical or compact bone, which makes 
up most of the shafts (diaphysis) of the long bones such as the femur and tibia, and 
35 trabecular or spongy bone which makes up most of the vertebral bodies and the ends of 
the long bones. 
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All bone is subject to continual tuniover, with old bone being actively resorbed, and new 
bone being deposited. This turnover, or "remodelling" is essential for maintenance of 
structural competence because continual loading results in the formation of numerous 
microfractures in the bone matrix which, if left unchecked, would be weak points that 
could seed catastrophic failures of the bone, i.e., clinically obvious fractures. Such a 
process can be likened to a stone-chip on an automobile windscreen: the small crack 
can act as a catalyst for the sudden failure of the entire structure. 

Remodelling is therefore an essential process for the maintaining bone strength. As the 
bone is resorbed and re-deposited, the microfractures and structural imperfections are 
removed. 

Trabecular bone has a greater surface area than cortical bone and because of this is 
remodeled more rapidly. Consequently, conditions associated with increased bone 
turnover tend to affect trabecular bone more quickly and more profoundly than cortical 
bone. Cortical bone is arranged in so-called Haversian systems which consists of a 
series of concentric lamellae of collagen fibres sun^ounding a central canal that contains 
blood vessels. Nutrients reach the central parts of the bone by an interconnecting 
system of canaliculi that run between osteocytes buried deep within bone matrix and 
lining cells on the bone surface. Trabecular bone has a similar structure, but here the 
lamellae run in parallel to the bone surface, rather than concentrically as in cortical bone. 

Bone Composition 

The organic component of bone matrix comprises mainly of type I collagen: a fibrillar 
protein formed from three protein chains, wound together in a triple helix. Collagen type 
I is laM down by bone forming cells (osteoblasts) in organised parallel sheets (lamellae). 
Type I collagen is a member of the collagen superfamily of related proteins which all 
share the unique structural motif of a left-handed triple helix. The presence of this 
structural motif, which is responsible for the mechanical strength of collagen sheets, 
imposes certain absolute requirements on the primary amino add sequence of the 
protein. If these requirements are not met, the protein cannot fomi into the triple helix 
characteristic of collagens. The most important structural requirements are the presence 
of glycine amino acid residues at every third position (where the amino acid side chain 
points in towards the center of the triple hefix) and proline residues at every third position 
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to provide both structural rigidity and periodidty on tlie lielix. Glycine is required 
because it lias the smallest side chain of all the proteogenic amino acids (just a single 
hydrogen atom) and so can be accommodated in the spatially constrained interior of the 
helix. Proline is required because proline is the only secondary amine among the 20 
5 proteogenic acids, which introduces a rigid 'bend* in the polypeptide, such that the 
presence of proline residues at repeated intervals will result in the adoption of a helical 
conformation. 

After synthesis, the collagen protein Is the subject of post-translational modifications 
10 which are essential for the structural rigidity required In bone. Rrstly, collagen becomes 
hydroxylated on certain proline and lysine residues (e.g. to fomn hydoxyproline and 
hydroxylysine, respectively). This hydroxylation depends on the activity of enzymes that 
require vitamin C as a cofactor. Vitamin C deficiency leads to scurvy, a disease in which 
bone and other collagen-containing tissues (such as skin, tendon and connective tissue) 
15 are structurally weakened. This demonstrates the essential requirement for normal 
collagen hydroxylation. 

After deposition into the bone, the collagen chains become cross^linked by specialised 
covalent bonds (pyridinium cross-links) which help to give bone its tensile strength. 
20 These cross links are fonned by the action of enzymes on the hydroxylated amino acids 
(particularly hydroxylysine) in Ihe collagen. It is the absence of these crosslinks which 
results in the weakened state of the tissue in scurvy when hydroxylation is inhibited by 
the absence of sufficient vitamin C. 

25 The biochemical structure of collagen is an important fector in the strength of bone, but 
the pattern in which it is laid down is also important. The collagen fibres should be laid 
down in ordered sheets for maximal tensile strength. However, when bone is fomied 
rapidly (for example in Paget's disease, or in bone metastases), the lamellae are laid 
down in a disorderly fashion giving rise to "woven bone," which is mechanically weak 

30 and easily fractured. 

Bone matrix also contains small amounts of other collagens and several non- 
coUagenous proteins and glycoproteins. The function of non-collagenous bone proteins 
is unclear, but it is thought that they are involved in mediating the attachment of bone 
35 cells to bone matrix, and in regulating bone cell activity during the process of bone 
remodelling. The organic component of bone fomns a framewortc (called osteoid) upon 
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whlch mineralisation occurs. After a tag phase of about 10 days, the matrix becomes 
mineralised, as hydroxyapatite ((Caio(P04)B(OH)2) crystals are deposited in tlie spaces 
befiween collagen fibrils. Mineralisation confers upon bone the property of mechanical 
rigidity, which complements the tensile strength, and elasticity derived from bone 
5 collagen. 

Bone cell function and bone remodelling 

The mechanical integrity of the skeleton is maintained by the process of bone • 
10 remodelling, which occurs throughout life, in order that damaged bone can be replaced 
by new bone. Remodelling can be divided into four phases; resorption; reversal, 
fomnation, and quiescence (see. e.g.. Raisz, 1988; Mundy. 1996). At any one time 
approximately 10% of bone surface in the aduK skeleton is undergoing active remodelled 
whereas the remaining 90% is quiescent. 

15 

Osteoclast formation and Differentiation 

Remodelling commences with attraction of bone resorbing cells (osteoclasts) to the site, 
which is to be resorbed. These are multinucleated phagocytic cells, rich In the enzyme 
tartrate-reslstant acid phosphatase, which are fonned by fusion of precursors derived 
from the cells of monocyte/macrophage lineage. Osteoclast fomiatlon and activation is 
dependent on dose contact between osteoclast precursors and bone mam)w stromal 
cells. Stromal cells secrete the cytokine M-CSF. which is essential for differentiation of 
both osteoclasts and macrophages from a common precursor. 

Mature osteodasts form a tight seal over the bone surface and resort bone by secreting 
hydrochloric add and proteolytic enzymes through the "ruffled border" Into a space 
beneath the osteoclast (Howship*s lacuna). The hydrochloric add secreted by 
osteodasts dissolves hydroxyapatite and allows proteolytic enzymes (mainly Cathepsin 
K and matrix metalloprotelnases) to degrade collagen and other matrix proteins. 
Deffciency of these proteins causes osteopetrosis which is a disease associated with 
increased bone mineral density and osteodast dysfunction. After resorption is 
completed osteodasts undergo programmed cell deatfi (apoptosis), in the so-called 
reversal phase which heralds the start of bone formation. 
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Bone formation begins with attraction of osteoblast precursors, wliich are derived from 
mesenchymal stem cells in the bone man-ow, to the bone surface. Although these cells 
5 have the potential to differentiate into many cell types including, adipocytes, myocytes, 
and chondrocytes, in the bone matrix they are driven towards an osteoblastic fate. 
Mature osteoblasts are plump cuboidal cells, which are responsible for the production of 
bone matrix. They are rich in the enzyme alkaline phosphatase and the protein 
osteocalcin, which are used clinically as semm markers of osteoblast activity. 

10 Osteoblasts lay down bone matrix which is initially unmineralised (osteoid), but which 
subsequently becomes calcified after about 10 days to form mature bone. During bone 
formation, some osteoblasts become trapped within the matrix and differentiate into 
osteocytes, whereas others differentiate into flattened "lining ceils" whidi cover the bone 
surface. Osteocytes connect with one another and with lining cells on the bone surface 

15 by an intricate network of cytoplasmic processes, running through cannaliculi in bone 
matrix. Osteocytes appear to act as sensors of mechanical strain in the skeleton, and 
release signalling molecules such as prostaglandins and nitric oxide (NO), which 
modulate the function of neighbouring bone cells. 

20 , Regulation of Bone Remodellina 

Bone remodelling is a highly organised process, but the mechanisms which determine 
where and when remodelling occurs are pooriy understood. Mechanical stimuli and 
areas of micro-damage are likely to be important In determining the sites at which 

25 remodelling occurs in the normal skeleton. Increased bone remodelling may result from 
local or systemic release of inflammatory cytokines like interieukin-1 and tumour necrosis 
factor in inflammatory diseases. Calciotropic homiones such as parathyroid hormone 
(PTH) and 1,25-dihydroxyvitamin D, act together to increase bone remodelling on a 
systemic basis allowing skeletal calcium to be mobilised for maintenance of plasma 

30 caldum homeostasis. Bone remodelling is also increased by other hormones such as 
thyroid hormone and growth homione, but suppressed by oestrogen, androgens and 
calcitonin. There has been considerable study of the processes which regulate the bone 
resorption side of the balance, but the factors regulating the rate of bone deposition are 
considerably less well understood. 



35 
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Bone Disorders 

There are a range of disorders of bone which result from the failure to properly regulate 
the metabolic processes which govern bone turnover (e.g., metabolic bone disorders). 

Osteoporosis (OP) is the most prevalent metabolic bone disease. It is characterized by 
reduced bone mineral density (BMD), deterioration of bone tissue, and increased risk of 
fracture, e.g., of the hip, spine, and wrist. Many factors contribute to the pathogenesis of 
osteoporosis including poor diet, lack of exercise, smoking, and excessive alcohol intake. 
Osteoporosis may also arise in association with inflammatory diseases such as 
rheumatoid arthritis, endocrine diseases such as thyrotoxicosis, and with certain dmg 
treatments such as glucocorticoids. However there is also a strong genetic component 
in the pathogenesis of osteoporosis. 

Osteoporosis is a major health problem in developed countries. As many as 60% of 
women suffer from osteoporosis, as defined by the Worid Health Organisation (WHO), 
with half of these suffers also having clinically relevant skeletal fractures. Thus 1 in 3 of 
all women in developed countries will have a skeletal fracture due to osteoporosis. This 
is a major cause of morbidity and mortality leading to massive health care costs (an 
\ estimated $14 billion per annum in the USA alone) (see, e.g.. Melton et al.," 1992). 

Osteopetrosis, the opposite of osteoporosis, is characterised by excessive bone mineral 
density. It Is, however, much rarer than osteoporosis with as few as 1 in 25.000 women 
affected. 

After osteoporosis, the next most prevalent bone disease is osteoarthritis. Osteoarthritis 
(OA) is the most common fonn of arthritis in adults, with symptomatic disease affecting 
roughly 10% of the US population over the age of 30 (see, e.g., Felson et al., 1 998). 
Because OA affects the weight bearing joints of the knee and hip more frequently than 
other joints, osteoarthritis accounts for more physical dlsabiltty among the elderiy than 
any other disease (see, e.g., Gucdone et al.. 1994). Osteoarthritis is the most common 
cause of total knee and hip replacement surgery, and hence offers significant economic 
as well as quality of life burden. Recent estimates sii^gest the total cost of osteoarthritis 
to the economy, accounting for lost woridng days, earty retirement and medical treatment 
may exceed 2% of the gross domestic product (see, e.g., Yelin, 1998). 
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The physiological mechanisms which underlie osteoarthritis remain hotly debated (see, 
e.g., Felson et al., 2000) but it seems certain that several environmental factors 
contribute, including excess mechanical loading of the joints, acute joint injury, and diet, 
as well as a strong genetic component. The disease is characterised by the narrowing of 
5 the synovial space in the joint, inflammatory and fibrous changes to the connective 
tissue, and altered tumover of connective tissue proteins, including the primary 
connective tissue collagen, type 11. The most recent studies suggest that osteoarthritis 
may result from misregulated connective tissue remodelling in much the same way that 
osteoporosis results from misregulated bone remodelling. Whereas osteoporosis is a 
10 disease of quantitatively low bone mineral density, osteoarthritis is a disease of spatially 
Inappropriate bone mineralisation. 

There are a range of other less common bone disorders, including: 

15 Ricl<etts and osteomalacia are the result of vitamin D deficiency. Vitamin D is required 
for absorption of calcium and phosphate and for their proper incorporation into bone 
mineral. Deficiency of vitamin D (called Ricketts in children and osteomalacia in adults) 
results in a range of symptoms Including low bone mineral density, bone deformation 
and in severe cases muscle tetany due to depletion of extracellular calcium ion stones. 

20 

Hyperparathyroidism (over production of parathyroid horomone or PTH) can have similar 
symptoms to Ricketts. This Is unsurprising since PTH production is stimulated in 
Ricketts as an attempt to maintain the free calcium ton concentration. PTH stimulates 
bone resorption by promoting osteoclast activity, and hence can result in symptoms 
25 resembling osteoporosis. Osteomalacia and hyperparathyoidism combined contribute 
only a very small fraction of all cases of adult osteoporosis. In almost every case, adult 
osteoporosis is due to defective bone depositton rather than overactive resorption (see, 
e.g. Guyton, 1991). 

30 Pagefs disease of bone is a relatively common condition (affecting as many as 1 in 1000 
people in some areas of the worid) of unknown cause, characterized by increased bone 
tumover and disorganized bone remodeling, with areas of increased osteoclastic and 
osteoblast activity. Although Pagetic bone is often denser than nonnal bone, the 
abnomial architecture causes the bone to be mechanically weak, resulting In bone 

35 defomiity and increased susceptibility to pathological fracture. 
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Multiple myeloma is a cancer of plasma cells. In contrast to most other haematolcgical 
malignancies, the tumour cells do not circulate In the blood, but accumulate in the bone 
man^w where they give rise to high levels of cytokines that activate osteoclastic bone 
resorption (e.g., interleukln-6). The disease accounts for approximately 20% of all 
5 haematological cancers and is mainly a disease of elderly people. 

Balance Between Bone Deposition and Bone Resorption 

AD of the bone pathologies listed above result from an imbalance between bone 
10 deposition and bone resorption. If the mechanisms regulating these two processes 
become uncoupled than pathological changes in bone mineral density result In just a 
few cases, the cause of the imbalance seems clean for example prolonged estrogen 
deficiency (such as due to surgical sterilisation) or lengthy treatment with glutocortlcoids 
(such as for asthma) both perturb the balance and can lead to rapid demineralisation of 
15 the bone and osteoporosis. 

Unfortunately, in the vast majority of cases the mechanisms resulting in loss of balance 
are much less dear. The difficulty m identifying the causes stems in part of the small 
scale imbalances that must be occurring. For example, most osteoporotic fractures do 
20 not occur until 20-30 years after the menopause. If, as Is generally assumed, the 

osteoporosis was initiated by the reduction In estrogen levels after the menopause, then 
the demineralisation has been occumng steadily over two or three decades. Since the 
bone remodelling process is relatively rapid (complete within 28 days in any given 
osteon) we must assume that the imbalance in favour of demineralisation is very small. 

25 

Current treatments 

There are currently two major classes of drugs used in the prevention and treatment of 
osteoporosis: (1) Honnonally active medications (estrogens, selective estrogen receptor 
30 modulators (SERMs)); and (2) anti-resorptlves. 

There is presently good data to suggest that the long term use of honnonally active 
medications (usually estrogen, estrogen analogs or conjugated estrogens) after the 
menopause in women can prevent bone demineralisation and hence delay the onset of 
35 osteoporosis. The molecular mechanisms Involved are not deariy defined, possibly 
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because they are so complex. However, there are plausible mechanisms which involve 
both stimulation of bone deposition and suppression of resorption. 

To date, such hormonally active medications, including the new generation of SERMs, 
5 such as Raloxifene™, which have the beneficial effects of estrogen on bone and the 
cardiovascular system but do not have the side effects of breast and uterine hyperplasia 
that can increase the risk of cancer, have not achieved widespread use for the treatment 
of existing osteoporosis. 

10 At present treatment of known or suspected bone mineral deficiency is most commonly 
by the use of dmgs to suppress osteoclast activity. The two most important drug groups 
in this class are bisphophonates (BPs) and non-steroidal anti-inflammatory drugs 
(NSAIDs). 

15 Bisphosphonates (also know as diphosphonates) are an important class of drugs used in 
the treatment of bone diseases involving excessive bone destruction or resorption, e.g., 
Paget' s disease, tumour-associated osteolysis, and also in post-menopausal 
osteoporosis where the defect might be in either bone deposition or resorption. 
Bisphosphonates are structural analogues of naturally occurring pyrophosphate. 

20 Whereas pyrophosphate consists of two phosphate, groups linked by an oxygen atom (P- 
O-P), bisphosphonates have two phosphate groups linked by a carbon atom (P-C-P). 
This makes bisphosphonates very stable and resistant to degradation. Furthermore, like 
pyrophosphate, bisphosphonates have very high affinity for calcium and therefore target 
to bone mineral in vivo. The carbon atom that links the two phosphate groups has two 

25 skle chains attached to it, which can be altered in structure. This gh/es rise to a multitude 
of bisphosphonate compounds with different anti-resorptive potencies. Bone resorption 
is mediated by highly specialised, multinucleated osteoclast cells. Bisphosphonate 
drugs specifically inhibit the activity and survival of these cells. Firstly, after intravenous 
or oral administration, the bisphosphonates are rapidly cleared from the circulation and 

30 bind to bone mineral. As the mineral is then resorbed and dissoh^ed by osteoclasts, it is 
thought that the drug is released from the bone mineral and is internalised by 
osteoclasts. Intracellular accumulation of the dmgs inhibits the ability of the cells to 
resorb bone (probably by interfering with signal transduction pathways or cellular 
metabolism) and causes osteoclast apoptosis (see, e.g., Hughes et al., 1997). 



"^^^^^^ PCr/GB02/01862 

-56- 

NSAIDs are widely used in tlie treatment of inflammatory diseases, but often cause 
severe gastro-intestinal (Gl) side effects, due ttieir inhibition of the prostaglandin- 
generating enzyme, cyclooxygenase (COX). Recently developed selective 
cyclooxygenase-2 (COX-2) inhibitore offer new treatment strategies which are lil<ely to be 
5 less toxic to the Gl tract NSAIDs developed by NIcox SA (Sophia Antlpolis, France), 
that contain a nitric oxide (NO)-donor group (NO-NSAID) exhibit anti-inflammatory 
properties without causing Gl side effects. The mechanisms responsible for the 
beneficial effects of NSAIDs on bone are not definitively identified, but since the bone 
resorbing osteoclast cells are derived from the circulating monocyte pool, it is not difficult 

10 to imagine why generalised anti-Inflammatory treatments might have anti-resoptfve 
effects. However, another dass of powerful anti-inflammatory molecules, the 
glucacofticoids and their analogs such as dexamethasone have the opposite effects to 
NSAIDs: chronic dexamethasone treatment (for example, in asthma) induces 
demineralisation and leads to symptoms of rapid onset osteoporosis. Consequently, 

15 while NSAIDs empirically have anti-resorptive properties, further investigations into the 
detail mechanism of action of these drugs are deariy required. 

It has recenUy been discovered that many of the dmgs, which are used dinically to inhibit 
bone resorption, such as bisphosphonates and oestrogen do so by promoting osteodast 
apoptosis (see. e.g., Hughes et al.. 1997). At present the most commonly used types of 
drugs used to suppress osteodast activity in these diseases are bisphophonates (BPs) 
and non-steroidal anti-inflammatory drugs (NSAIDs). 



Umitations of cun-ent treatments 



There are a number of limitations which impact on the dinical utility of ail the available 
therapeutic and preventative modalities. For example, both honnonal medications (HRT 
and SERMs) and anb'resorptives (BPs and NSAIDs) primarily target rBsorption. While 
this may be useful in. for example Pagef s disease, it is likely to be less useful in 
osteoporosis, where the majority of cases have reduced deposition rates as the primary 
defect Of course, because bone mineral density is a balance between deposition and 
resorption rates, antiresorptive strategies can have some efficacy even where the 
primary defect is in the rate of deposition. 



Possibly because cun-ent therapeutics target resorption when suppressed deposition is 
the primary defed in osteoporosis, none of the current agents can build bone, but 
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instead only halt further deminerallsation. Because of the limited availability of 
diagnostic techniques, particularly for population screening, treatment cannot usually 
begin until clinical symptoms exist (such as fracture) by which point the bones may 
already be dangerously demineralised. In such cases (which are the majority), a therapy 
5 which increases bone mineral density would be desirable. A new treatment based on 
abolishing proline deficiency would stimulate deposition rate and hence be a new 
category of therapeutic: one which targets deposition preferentially over resorption. 
Therapeutics of this categoy would be expected to overcome the limitation of being 
unable to increase bone mineral density. 

10 

Another limitation of exisiting therapies is the failure to treat the underlying cause of the 
pathology, but rather to try and alleviate the symptoms. In part, this is because few 
direct causes of osteoporosis have been identified. The inventors have identified a novel 
contributory hfiechanism to the development of osteoporosis and hence have provided 
15 the first therapeutic approach to target one of the direct mechanisms resulting in 
pathologically low bone mineral density. 

Bone Disorder Diagnostics 

20 It has long been clear that early diagnosis of bone disorders was essential for good 
therapeutic management Although there are now several effective treatments for 
osteoporosis, each one is only able to anrest the further loss of bone mineral density. No 
treatment to date has been effective in reversing loss which has already occurred. Thus 
early, reliable diagnosis of declining bone mineral density is of the utmost clinical 

25 importance. 

Existing diagnosis methods for bone disorders fall into two categories: 

(a) direct observation (for example, bone mineral density scans for osteoporosis 
or radiographic assessment for osteoarthritis); and, 
30 (b) indirect observation of molecular markers of remodelling (for example, 

collagen breakdown products). 

Of the major determinants for bone fracture, only bone mineral density can presently bei 
detennlned with any precision and accuracy. 

35 
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Examples of molecular diagnostics indude the measurement of free crosslinks, 
hydroxyproline, collagen propeptides, or alkaline phosphatase in serum or urine. Free 
crosslinks are produced when collagen is degraded during resorption. Although the 
5 collagen can mostly be broken down to free amino acids, the trimerised hydroxylysine 
residues that formed the crosslinks cannot be further metabolised and so accumulate in 
the blood until secreted by the kidney in urine. Thus the levels of crosslink in serum or in 
urine wilt be related to the rate of collagen breakdown (most, but not all, of which will be 
occurring in the bone). Tests for hydroxyproline rely on a similar principle: free proline 

10 (that is, proline not incorporated into protein) is never in the hydroxylated form, 
hydroxyproline. As a result, the only source of free hydroxyproline in blood Is from 
collagen breakdown. As for crosslinks, the free hydroxyproline generated during 
breakdown cannot be metabolised any further and accumulates until excreted by the 
kidney. Unfortunately, the level of both of these metabolites (in either serum or urine) is 

1 5 significantly affected by kidney function. 

Collagen is produced as a proprotein which has both an N4erminal and C-tenninal 
extension cleaved off prior to incorporation into the extracellular matrix. These 
extensions, or propeptides, are then metabolised or excreted. However, the steady state 
20 level of the propeptides has been suggested to be a marker for collagen deposition, 
some, but not ail, of which is likely to be occurring in the bone. 

Problems with Current Diagnostic Methods 

25 The gold standard bone densitometry method, DEXA scanning, is too cumbersome and 
expensh^e for routine screening procedures in women without clinical signs of 
osteoporosis. It requires specialist apparatus (which Is large and expensive to install and 
maintain) as well as specialist training for its operation. Despite accurately measuring 
bone mineral density, and hence providing the benchmark diagnosis of osteoporosis, 

30 nevertheless it does not accurately predict future fracture risk, suggesting that bone 
quality as well as density may also be important (see, for example, the comments 
above). 

Ultrasound measurements on the heel are simpler to perform, using cheaper apparatus 
35 and requiring less operator training, but the results are generally less able to predict the 
presence of either osteoporosis or future fracture risk. 
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Molecular diagnostics are considerably easier to implement, although in many cases the 
reagents required for the assays are expensive to obtain. The major disadvantage of the 
markers which have been evaluated to date is that the levels of the breakdown products 
in serum or urine are not partlculariy temporally stable, changing with diurnal rhythm and 
also from day to day. As a result, spot measures O e., a single specimen taken at a 
randomly chosen time) have virtually no diagnostic or prognosflc power. Series of 
measurements can be used to prowde some indicafion of relative risk for osteoporosis, 
but the odds ratio for having osteoporosis is only approximately 2-fold among indMduals 
with high levels of the turnover mariners (see, e.g., Gamero, 1996). Such a weak 
association is of little or no practical clinical value, and as a result, biochemical mariners 
of bone metabolism have not found widespread application in the clinical arena, and 
have not been considered for population screening. 

Another important limitation of current molecular diagnostics is the focus on the products 
of bone metabofism (such as cross links, hydroxyproline, and collagen propetides). 
These species might offer diagnostic potential but they provide no Information at all 
about the underiying causes of the imbalance between deposition and resorption. 
Identification of a risk factor that was not a direct mariner of bone tumover may offer the 
prospect of identi^ng therapeutfc targets as well as having prognostic potential. 

NMR SoectroscoDv 

As discussed above, many aspects of the present invention pertain to methods which 
employ NMR spectra, or data obtained or derived from NMR spectra. 

The principal nucleus studied in biomedical NMR spectroscopy is the proton or 
nucleus. This is the most sensitive of all naturally occurring nuclei. The chemical shift 
range Is about 10 ppm for organte molecules. In addition "C NMR spectroscopy using 
either the naturally abundant 1.1% "C nuclei or employing isotopic enrichment is useful 
for identifying metabolites. The "C chemical shift range is about 200 ppm. Other nuclei 
find special application. These include «N (in natural abundance or enriched), "F for 
studies of drug metabolism, and '^P for studies of endogenous phosphate biochemistry 
either in vitro or in vivo. 
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in order to obtain an NMR spectrum, it is necessary to define a "puise program*'. At Its 
simplest, tills Is application of a radio-frequency (RF) pulse followed by acquisition of a 
free induction decay (FID) - a time-dependent oscillating, decaying voHage which is 
digitised in an analog-digital converter (ADC). At equilibrium, the nuclear spins are 
5 present in a number of quantum states and the RF pulse disturbs this equilibrium. The 
FID is the result of the spins returning towards the equilibrium state. It is necessary to 
choose the length of the pulse (usually a few microseconds) to give the optimum 
response. 

10 This, and other experimental parameters are chosen on the basis of knowledge and 
experience on the part of the spedroscoplst. See, for example, T.D.W. Claridge. High- 
Resolution NMR Techniques in Oroanic Chemistry: A Practical Guide to Modem NMR 
forChemists. Oxford University Press, 2000. These are based on the observation 
frequency to be used, the known properties of the nucleus under study (i.e., the 

1 5 expected chemical shift range virill determine the spectral width, the desired peak 
resolution detennines the number of data points, the relaxation times detennine the 
recycle time between scans, etc.). The number of scans to be added is determined by 
the concentration of the analyte, the inherent sensitivity of the nucleus under study and 
its abundance (either natural or enhanced by isotopic enrichment). 

20 

After data acquisition, a number of possible manipulations are possible. The FID can be 
multiplied by a mathematical function to improve the signal-to-noise ratio or reduce the 
peak line widths. The expert operator has choice over such parameters. The FID is 
tiien often filled by a number of zeros and then subjected to Fourier transfonmation. After 

25 this conversion from time-dependent data to frequency dependent data, it is necessary 
to phase the spectrum so that all peaks appear upright - this is done using two 
parameters by visual inspection on screen (now automatic routines are available with 
reasonable success). At this point the spectrum baseline can be curved. To remedy 
this, one defines points in the spectrum where no peaks appear and these are taken to 

30 be baseline. Usually, a polynomial function is fitted to these points, but other methods 
are available, and this function subtracted from the spectnim to provide a fiat baseline. 
This can also be done In an automatic fashion. Other manipulations are also possible. It 
is possible to extend the FID fonwards or backwards by "linear prediction" to Improve 
resolution or to remove so-called taincation artefacts which occur if data acquisition of a 

35 scan Is stopped before flie FID has decayed into ttie noise. All of these decisions are 
also applicable to 2- and 3-dimensional NMR spectroscopy. 
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An NMR spectrum consists of a series of digital data points with a y value (relating to 
signal strength) as a function of equally spaced x-values (frequency). These data point 
values mn over the whole of the spectrum. Individual peal<s in the spectrum are 
5 Identified by the spectroscopist or automatically by software and the area under each 
peak is determined either by integration (summation of the y values of all points over the 
peak) or by curve fitting. A peak can be a single resonance or a multiplet of resonances 
corresponding to a single type of nucleus in a particular chemical environment (e.g., the 
two protons ortho to the carboxyl group in benzoic add). Integration is also possible of 

10 the three dimensional peak volumes in 2-dimensional NMR spectra. The Intensity of a 
peak in an NMR spectrum is proportional to the number of nuclei giving rise to that peak 
(if the experiment is conducted under conditions where each successive accumulated 
free induction decay (FID) is taken starting at equilibrium). Also, the relative intensity of 
peaks from different analytes in the same sample is proportional to the concentration of 

15 that analyte (again if equilibrium prevails at the start of each scan). 

Thus, the temn "NMR spectral intensity," as used herein, pertains to some measure 
related to the NMR peak area, and may be absolute or relative. NMR spectral intensity 
may be, for example, a combination of a plurality of NMR spectral intensities, e.g., a 
linear combination of a plurality of NMR spectral intensities. 

In the context of NMR spectral intensity, the temi "NMR" refers to any type of NMR 
spectroscopy. 

NMR spectroscopic techniques can be classified according to the number of frequency 
axes and these include ID-, 2D-, and 3D-NMR. ID spectra include, for example, single 
pulse; water-peak eliminated either by saturation or non-excitation; spin«echo, such as 
CPMG (I.e.. edited on the basis of spin-spin relaxation); diffusion-edited, selecth^e 
excitation of specific spectra regions. 2D spectra include for example J-resoh/ed (JRES); 
1H-1H correlafion methods, such as NOESY. COSY, TOCSY and variants thereofi 
heteronudear conrelation including direct detection methods, such as HETCOR, and 
inverse-detected methods, such as 1H-13C HMQC, HSQC. HMBC. 3D spectra, include 
many variants, all of which are combinations of 2D methods, e.g. HMQC-TOCSY, 
NOESY-TOCSY, eta All of these NMR spectroscopic techniques can also be combined 
with magio-angle-spinning (MAS) in order to study samples other than Isotropic liquids, 
such as tissues, which are characterised by anisotropic composition. 
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Preferred nuclei include and ^^C. Preferred techniques for use in the present 
invention include water-peak eliminated, spin-echo such as CPM6, diffusion edited, 
JRES. COSY, TOCSY, HMQC, HSQC, and HMBC. 

5 

NMR analysis (especially of biofluids) is carried out at as high a field strength as is 
practical, according to availability (very high field machines are not widespread), cost (a 
600 MHz instrument costs about £500.000 but a shielded 800 MHz instrument can cost 
more than £3,500,000, depending on the nature of accessory equipment purchased), 
1 0 and ability to accommodate the physical size of the instrument Maintenance/operational 
costs do not vary greatly and are small compared to the capital cost of the machine and 
the personnel costs. 

Typically, the ^H observation frequency is from about 200 MHz to about 900 MHz, more 
15 typically from about 400 MHz to about 900 MHz, yet more typically from about 500 MHz 
to about 750 MHz. ^H observation frequencies of 500 and 600 MHz may be particularly 
preferred. Instruments with the following ^H obsen/ation frequencies are/were 
commercially available: 200. 250. 270 (discontinued), 300, 360 (discontinued), 400. 500, 
600. 700. 750. 800, and 900 MHz. 

20 

Higher frequencies are used to obtain better signal-to-noise ratio and for greater spectral 
dispersion of resonances. This gives a better chance of identifying the molecules gi>ring 
rise to the peaks. The benefit is not linear because in addition to the better dispersion, 
the detailed spectral peaks can move from being "second-order" - where analysts by 
25 inspection is not possible, towards **first-order," where it is. Both peak positions and 
intensities within multiplets change in a non-linear fashion as this progression occurs. 
Lower observation frequencies would be used where cost is an issue, but this is likely to 
lead to reduced effectiveness for classification and identification of biomarkers. 

30 NMR SpectroscoDv: Sample Preparation 

NMR spectra can be measured in solid, liquid, liquid crystal or gas states over a range of 
temperatures from 120 K to 420 K and outside this range with specialised equipment. 
Typically. NMR analysis of biofluids is perfonned in the liquid state with a sample 
35 temperature of from about 274 K to about 328 K, but more typically from about 283 K to 
about 321 K. An example of a typical temperature is about 300 K. 
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Lower temperatures would be used to ensure that the biofluid did not suffer from any 
decomposition or show any effects of chemical or enzymatic reactions during the data 
acquisition. Higher temperatures may be used to improve detection of certain species. 
For example, for plasma or serum, lipoproteins undergo a series of phase changes as 
the temperature is increased; in particular, the low density lipoprotein (LDL) peak 
intensities are rather temperature dependent and the lines sharpen and broader more- 
difficult-to-detect components become visible as the lipoprotein becomes more 'liquid." 

Typically, biofluid samples are diluted with solvent prior to NMR analysis. This is done 
for a variety of reasons, including: to lessen solution viscosity, to control the pH of the 
solution, and to allow addition of reagents and reference materials. 

An example of a typical dilution solvent is a solution of 0.9% by weight of sodium chloride 
in D2O. The D2O lessens the overall concentration of H2O and eases the technical 
requirements in the suppression of the solvent water NMR resonance, necessary for 
optimum detection of metabolite NMR signals. The deuterium nuclei of the D2O also 
provides an NMR signal for locking the magnetic field enabling the exact co-registration 
of successive scans. 

Depending on the available amount of the biofluid, typically, the dilution ratio is from 
about 1 :50 to about 5: 1 by volume, but more typically from about 1 :20 to about 1 : 1 by 
volume. An example of a typical dilution ratio is 3:7 by volume (e.g., 150 ^ sample, 
350 pL solvent), typical for conventional 5 mm NMR tubes and for flownnjection NMR 
spectroscopy. 

Typical sample volumes for NMR analysis are from about 50 (e.g., for microprobes) to 
at)out 2 mL. An example of a typical sample volume is about 500 |iL. 

NMR peak positions (chemical shifts) are measured relative to that of a known standard 
compound usually added directly to the sample. For biofluids such as urine this is 
commonly a partially deuterated form of TSP, i.e., 3-trimethylsflyK2,2,3,3-Hfproplonate 
sodium salt For biofluids containing high levels of proteins, this substance is not 
suitable since H binds to proteins and shows a broadened NMR line. Added fonnate 
anion (e.g., as a salt) can be used in such cases as for blood plasma. 
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NMR Spectroscopy: Manipulation of NMR Spectra 

NMR spectra are typically acquired, and subsequently, handled in digitised fomi. 
5 Conventional methods of spectral pre-processing of (digital) spectra are well known, and 
include, where applicable, signal averaging, Fourier transformation (and other 
transformation methods), phase correction, baseline con-ectlon, smoothing, and the like 
(see, for example, Lindon et al., 1980). 

1 0 Modem spectroscopic methods often pemiit the collection of high or very high resolution 
spectra' In digital form, even a simple spectmm (e.g., signal versus spectroscopic 
parameter) may have many thousands, if not tens of thousands of data points. It is often 
desirable to reduce or compress the data to give fewer data points, for both practical 
computing methods and also to effect some degree of signal averaging to compensate 

15 for physical effects, such as pH variation, compartmentalisation, and the like. The 
resulting data may be refened to as "spectral data." 

For example, a typical NMR spectrum Is recorded as signal intensity versus chemical 
shift (6) which ranges from about 5 0 to 5 10. At a typical chemical shift resolution of 
20 about 6 lO'^-IO*^ ppm, the spectrum in digital form comprises about 10.000 to 100.000 
data points. As discussed above, it is often desirable to compress this data, for example, 
by a factor of about 10 to 100, to about 1000 data points. 

For example, In one approach, the chemical shift axis, 5, is "segmented" into "buckets" 
25 or "bins" of a spedfic length. For a 1-D NMR spectrum which spans the range firom 5 
0 to S 1 0, using a bucket length, A5, of 0.04 yields 250 buckets, for example, 5 10.0- 
9.96, 5 9.96-9.92, 5 9.92-9.88. etc., usually reported by their midpoint, for example, 5 
9.98, 5 9.94, 5 9.90, etc. The signal Intensity within a given bucket may be averaged or 
integrated, and the resulting value reported. In this way, a spectrum with, for example, 
30 100,000 original data points can be compressed to an equivalent spectrum with, for 
example, 250 data points. 

A similar approach can be applied to 2-D spectra, 3-D spectra, and the like. For 2-D 
spectra, the "bucket" approach may be extended to a "patch." For 3-D spectra, the 
35 "bucket" approach may be extended to a "volume." For example, a 2-D NMR 

spectrum which spans the range from 5 0 to 5 10 on both axes, using a patch of 0.1 x 
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A5 0.1 yields 10,000 patches. In this way, a spectrum with perhaps 10' original data 
points can be compressed to an equivalent spectrum of 10* data points. 

In this context, the equivalent spectrum may be refen-ed to as "a spectral data set," "a 
data set comprising spectral data," etc. 

Software for such processing of NMR spectra, for example AMIX (Analysis of l\4IXture, V 
2.5, Bmker Analytilc, Rheinstetten, Germany) is commerdaliy available. 

Often, certain spectral regions cany no real diagnostic infonnation, or cany conflicting 
biochemical infomiation, and it is often useful to remove these "redundanf regions 
before perfonning detailed analysis. In the simplest approach, the data points are 
deleted. In another simple approach, the data in the redundant regions are replaced with 
zero values. 

For example, due to the dynamic range problem with water in comparison with other 
molecules, the water resonance (around 6 4.7) is suppressed. However, small variations 
in water suppression remain, and these variations can undesirably complicate analysis. 
SImilariy, variations in water suppression may also affect the urea signal (around 5 6.0), 
by cross saturation. Therefore, it is often useful to delete certain spectral regions, for 
example, from about 6 4.5 to 6.0 (e.g., 6 4.52 to 6.00). 

In general, NMR data is handled as a data matrix. Typically, each row in the matrix 
corresponds to an individual sample (often referred to as a "data vector), and the entries 
in the columns are, for example, spectral intensity of a particular data point, at a 
particular 6 or A6 (often refened to as "descriptors"). 

It is often useful to pre-process data, for example, by addressing missing data, 
translation. scaOng, weighting, etc. 

Multivariate projection methods, such as principal component analysis (PCA) and partial 
least squares analysis (PLS), are so-caUed scaling sensitive methods. By using prior 
knowledge and experience about the type of data studied, the quality of the data prior to 
multivariate modefflng can be enhanced by scaling and/or weighting. Adequate scaling 
and/or weighting can reveal the important and interesting variation hidden within in the 
data, and therefore mal® subsequent multivariate modelling more efBdenL Scaling and 
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weighting may be used to place the data In the con^ect metric, based on knowledge and 
experience of the studied system, and therefore reveal patterns already inherently 
present in the data. 

5 If at all possible, missing data, for example, gaps In column values, should be avoided. 
However, if necessary, such missing data may replaced or Tilled" with, for example, the 
mean value of a column C'mean fill"); a random value ("random fill"); or a value based on 
a principal component analysis ("principal component fill"). Each of these different 
approaches will have a different effect on subsequent PR analysis. 

10 

'Translation" of the descriptor coordinate axes can be useful. Examples of such 
translation include normalisation and mean centring. 

"Normalisation" may t>e used to remove sample-to-sample variation. Many normalisation 
15 approaches are possible, and they can often be applied at any of several points in the 
analysis. Usually, normalisation is applied after redundant spectral regions have been 
removed. In one approach, each spectrum is normalised (scaled) by a factor of 1/A. 
where A is the sum of the absolute values of all of the descriptors for that spectrum. In 
this way. each data vector has the same length, specifically, 1 . For example, if the sum 
20 of the absolute values of intensities for each bucket in a particular spectrum is 1067, then 
the intensity for each bucket for this particular spectrum is scaled by 1/1067. 

"Mean centring" may be used to simplify Interpretation. Usually, for each descriptor, the 
average value of that descriptor for all samples is subtracted. In this way, the mean of a 
25 descriptor coincides with the origin, and all descriptors are "centred" at zero. For 
example, if the average intensity at 5 10.0-9.96, for all spectra, is 1.2 units, then the 
intensity at 5 10.0-9.96, for all spectra, is reduced by 1.2 units. 

In "unit variance scaling," data can be scaled to equal variance. Usually, the value of 
30 each descriptor is scaled by 1/StDev, where StDev is the standard deviation for that 
descriptor for all samples. For example, if the standard deviation at 5 10.0-9.96, for all 
spectra, is 2.5 units, then the intensity at 6 10.0-9.96. for all spectra, is scaled by 1/2.5 or 
0.4, Unit variance scaling may be used to reduce the impact of "noisy" data. For 
example, some metat)olites in biofluids show a strong degree of physiological variation 
35 (e.g., diurnal variation, dietary-related variation) that is unrelated to any 
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pathophysiological process. Without unit variance scaling, these noisy metabolites may 
dominate subsequent analysis. 

"Pareto scaling" is, in some sense, intemiediate between mean centering and unit 
variance scaling. In effect, smaller peaks In the spectra can influence the model to a 
higher degree than for the mean centered case. Also, the loadings are, in general, more 
interpretable than for unit variance based models. In pareto scaling, the value of each 
descriptor is scaled by 1/sqrt(StDev), where StDev is the standard deviation for that 
descriptor for all samples. In this way, each descriptor has a variance numerically equal 
to its initial standard deviation. The pareto scaling may be perfonned, for example, on 
raw data or mean centered data. 

"Logarithmic scaling" may be used to assist interpretation when data have a positive 
skew and/or when data spans a large range, e.g., several orders of magnitude. Usually, 
for each descriptor, the value is replaced by the logarithm of that value. For example, 
the Intensity at 6 10.0-9.96 is replaced the logarithm of the intensity at 5 10.0-9.96. for all 
spectra. 

In "equal range scaling," each descriptor Is divided by the range of that descriptor for all 
samples. In this way, all descriptors have the same range, that Is, 1. For example, if, at 
6 10.0-9,96. for all spectra, the largest value is 87 units and the smallest value is 1 , then 
the range is 86 units, and the intensity at 6 10.0-9.96, for all spectra, is divided by 86 
units. However, this method is sensitive to presence of outlier points. 

In "autoscaling," each data vector is mean centred and unit variance scaled. This 
technique is a very useful because each descriptor is then weighted equally and, in the 
case of NMR descriptors, large and small peaks are treated with equal emphasis. This 
can be Important for metabolites present at very low. but still detectable, levels. 

Several supervised methods of scaling data are also known. Some of these can provide 
a measure of the ability of a parameter (e.g., a descriptor) to discriminate between 
classes, and can be used to Improve dassificatran by stretching a separation. 

For example, in "variance weighting," the variance weight of a single parameter (e.g., a 
descriptor) is calculated as the ratio of the Inter-dass variances to the sum of the intra- 
class variances. A large value means that this variable is discriminating between the 
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classes. For example, if the samples are known to fall into two classes (e.g., a training 
set), it is possible to examine the mean and variance of each descriptor. If a descriptor 
has very different mean values and a small variance, then it will be good at separating 
the classes. 

5 

"Feature weighting" is a more general description of variance weighting, where not only 
the mean and standard deviation of each descriptor is calculated, but other well known 
weighting factors, such as the Fisher weight, are used. 

10 Multivariate Statistical Analysis 

As discussed above, multivariate statistics analysis methods, including pattern 
recognition methods, are often the most convenient and efficient way to analyse complex 
data, such as NMR spectra. 

15 

For example, such analysis methods may be used to identify, for example diagnostic 
spectral windows and/or diagnostic species, for a particular condition under study. 

Also, such analysis methods may be used to form a predictive model, and then use that 
20 model to classify test data. For example, one convenient and particulariy effective 

method of classification employs multivariate statistical analysis modelling, first to fonm a 
model (a "predictive mathematical model") using data ("'modelling data") firom samples of 
known class (e.g., from subjects known to have, or not have, a parttoular condition), and 
second to classify an unknown sample (e.g., "test data"), as having, or not having, that 
25 condition. 

Examples of pattern recognition methods include, but are not limited to, Principal 
Component Analysis (PCA) and Partial Least Squares-Discriminant Analysis (PLS-DA). 

30 PCA is a bilinear decomposition method used for overviewing "clusters" within 

multivariate data. The data are represented in K-dimensional space (where K is equal to 
the number of variables) and reduced to a few principal components (or latent variables) 
which describe the maximum variation within the data, independent of any knowledge of 
dass membership (i.e., "unsupervised'^' The principal components are displayed as a 

35 set of "scores" (t) which highlight clustering, trends, or outliers, and a set of "loadings" (p) 
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whlch highlight the influence of Input variables on t See, for example, Kowalski et al., 
1986). 

The PCA decomposition can be described by the following equation: 

X = TP' + E 

where T Is the set of scores explaining the systematic variation between the 
observations in X and P is the set of loadings explaining the between variable variation 
and provides the explanation to clusters, trends, and ouUiers in the score space. The 
non-systemaUc part of the variation not explained by the model fomis the residuals, E 

PLS-DA is a supervised multivariate method yielding latent variables describing 
maximum separation between known classes of samples. PLS-DA is based on PLS 
which is the regression extension of the PCA method explained eariier. When PCA 
worics to explain maximum variation between the studied samples PLS-DA suffices to 
explain maximum separation between known classes of samples in the data (X). This Is 
done by a PLS regression against a "dummy vector or matrix" (Y) carrying the class 
separating infonnatlon. The calculated PLS components will thereby be more focused 
on describing the variation separating the classes in X If this infomiation is present In the 
data. From an interpretation point of view all the features of PLS can be used, virtilch 
means that the variation can be interpreted in temis of scores (t,u), loadings (p,c), PLS 
weights (w) and regression coefficients (b). The fact that a regression is carried out 
against a known dass separation means that the PLS-DA is a supervised method and 
that the dass membership has to be known prior to the actual modelling. Once a model 
is calculated and valMated it can be used for prediction of dass membership for "new" 
unknown samples. Judgement of dass membership is done on basis of predicted dass 
membership (Ypred), predided scores (tpned) and predided residuals (DmodXpred) 
using stata'stteal significance limits for the decfelon. See, for example, Sjostrom et al.. 
1986; Stable etal.. 1987. 



In PLS, the variation between the objects in X is described by the X-scores. T. and the 
variation in the Y-block regressed against is described in the Y-scores, U. In PLS-DA 
the Y-block is a "dummy vedor or matrix" describing the dass membership of each 
observation. Basically, what PLS does is to maximize the covariance between T and U. 
For each component, a PLS weight vedor. w, is calculated, containing the influence of 
each X-variable on the explanation of the variation in Y. Together the weight vedore will 
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form a matrix, W, containing the variation in X that maximizes the covariance between 
the scores T and U for each calculated component. For PLS-DA this means that the 
weights, W, contain the variation in X that is correlated to the class separation described 
in Y. The Y-block matrix of weights is designated C, A matrix of X-loadings, P, is also 
5 calculated. These loadings are apart from interpretation used to perform the proper 
decomposition of X. 

The PLS decomposition of X and Y can hence be described as follows: 

X = TP* + E 

10 Y = TC' + F 

The PLS regression coefficients, B, are then given by: 

BsWCPW^C 

IS The estimate of Y, Yhati can then be calculated according to the following formula: 

Yt,at = XW(PTAO*^C'=XB 

Both of the pattern recognition algorithms exemplified herein (PCA, PLS-DA) rely on 
extraction of linear associations between the input variables. When such linear 

20 relationships are insuffident, neural network-based pattern recognition techniques can in 
some cases improve the ability to classify individuals on the basis of the many inter- 
related input variables (see, e.g., Ala-Korpela et al., 1995; Hiltunen et al., 1995). 
Nevertheless, the methods applied herein are sufficiently powerful to allow classification 
of the individuals studied, and they provide an additional benefrt over neural network 

25 methods in that they allow some information to be gained as to what aspects of the input 
dataset were particularly Important in allowing classification to be made. 

Spurious or irregular data in spectra f outliers"), which are not representative, are 
preferably identified and removed. Common reasons for in-egular data ("outliers") 
30 include spectral artefacts such as poor phase con^ction, poor baseline correction, poor 
chemical shift referencing, poor water suppression, and biological effects such as 
bacterial contamination, shifts In the pH of the biofluid, toxin- or disease-induced 
biochemical response, and other conditions, e.g., pathological conditions, which have 
metabolic consequences, e.g., diabetes. 

35 
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Outliers are Identified in different ways depending on the method of analysis used. For 
example, when using principal component analysis (PCA), small numbers of samples 
lying fiar from the rest of the replicate group can be identified by eye as outliers. A more 
objective means of identification for PCA is to use the Hotelling's T Test which is the 
5 multivariate version of tiie well known Student's T test used In univariate statistics. For 
any given sample, tiie T2 value can be calculated and this Is compared witti a standard 
value within which a chosen fraction (e.g., 95%) of the samples would nomially lie. 
Samples with T2 values substantially outside tills limit can ttien be flagged as ouUlers. 

10 Also, when using more sophisticated supervised metiiods. such as SIMCA or PNNs. a 
similar meUiod is used. A confidence level (e.g., 95%) is selected and the region of 
multivariate space corresponding to confidence values above this limit is detemiined. 
This region can be displayed graphically in several different ways (for example by 
plotting the critical T2 ellipse on a PCA scores plot). Any samples falling outside the high 

15 confidence region are flagged as potential outiiers. 

Confidence Limits for outiier detection are also calculated in tiie residual direction 
expressed as tiie distance to model in X (DModX). 

20 Briefly, DModX is the perpendicular distance of an object to ttie principal component (or 
to the plane or hyper plane made up by two or more principal components). In tiie 
SIMCA sofhvare. DModX is calculated as: 

DModX = v*sqrt(e2/K-A) 

25 

wherein e is ttie residual for a single observation; 
K is the numb^ of original variables in the data set; 
A is the number of principal components in tfie model; 

V Is a correction factor, based on ttie number of observations (N) and tiie number of 
30 prindpal components (A), and is slightiy larger ttian one. 

The outliers in this direction are not as severe as ttiose occurring in the score direction 
but should always be carefully examined before making a decision wheflier to include 
Oiem in ttie modelling or not. In general, all outiiers are thoroughly investigated, for 
35 example, by examining ttie contributing loadings and distance to model (DModX) as well 
as visually inspecting ttie original NMR spectrum for deviating features, before removing 
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them from the model. Outlier detection by automatic algorithm is a possibility using the 
features of scones and residual distance to model (DModX) described above. 

When using PLS methods, the distance to the model in Y (DmodY) can also be 
5 calculated in the same way. 

Data Filtering 

Although pattern recognition methods may be applied to "unfiltered" data, it is often 
1 0 preferable to first filter data to removed irrelevant variation. 

In one method, latent variables which are of no interest may be removed by "filtering." 

Examples of filtering methods include the regression of descriptor variables against an 
15 index based on sample class to eliminate variables with low conelation to the predefined 
classes. Related methods include target rotation (see, e.g., Kvalheim et al., 1989) and 
PCT filtering (see. e.g., Sun, 1997). In these methods, the removed variation is not 
necessarily completely uncorrelated with sample class (i.e., orthogonal). 

20 In another method, latent variables which are orthogonal to some variation or class index 
of interest are removed by "orthogonal filtering.*' Here, variation in the data which is not 
correlated to (i.e., is orthogonal to) the dass separating variation of interest may be 
removed. Such methods are, in general, more efficient than non-orthogonal filtering 
methods. 

25 

Various orthogonal filtering methods have been described (see, e.g., Wold et al., ig98a; 
Feam, 2000; Anderson, 1999; Westerhuis et a!., 2001; Wise et al., 2001). 

One preferred orthogonal filtering method is conventionally refeaed to as Orthogonal 
30 Signal Correction (OSC), wherein latent variables orthogonal to the variation of interest 
are removed. See, for example, Wold et al., 1998a. 

The dass identity is used as a response vector, Y, to describe the variation between the 
sample dasses. The OSC method then locates the longest vector descnlsing the 
35 variation between the samples which is not correlated with the Y*vector, and removes it 
from the data matrix. The resultant dataset has been filtered to allow pattern recognition 
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focused on the variation conrelated to features of interest within the sample population, 
rather than non-correlated, orthogonal variation. 

OSC is a method for spectral filtering that solves the problem of unwanted systematic 
variation in the spectra by removing components, latent variables, orthogonal to the 
response calibrated against. In PLS. the weights, w. are calculated to maximise the 
covariance between X and Y. In OSC, in contrast, the weights, w, are calculated to 
minimize the covariance between X and Y, which is the same as calculating components 
as close to orthogonal to Y as possible. These components, orthogonal to Y, containing 
unwanted systematic variation are then subtracted from the spectral data, X, to produce 
a filtered predictor matrix describing the variation of interest Briefly, OSC can be 
described as a bilinear decomposition of the spectral matrix, X, in a set of scores, T*. 
and a set of con^ponding loadings. P**, containing varition orthogonal to the response, 
Y. The unexplained part or the residuals. E. is equal to the filtered X-matrix, Xosc. 
containing less unwanted variation. The decomposition is described by the following 
equation: 

X = r*P**' + E 

Xosc — E 

The OSC procedure starts by calculation of the first latent variable or principal 
component describing the variation in ttie data, X The calculation is done according to 
the NIPALS algorithm. 

X = tp* + E 

The first score vector, t, which is a summary of the between sample variation in X. Is 
then orthogonalized against response (Y), giving the orthogonalized score vector f . 

t* = (l-Y(YY)^Y*)t 

After orthogonalization, the PLS weights, w, are calculated with the aim of making Xw = 
f . By doing this, the weights, w, are set to minimize the covariance between X and Y. 
The weights, w. are given by. 
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An estimate of the orthogonal score is calculated from: 

t**=:Xw 

5 

The estimate or updated score vector t** is then again orthogonalized to Y, and the 
iteration proceeds until t** has converged. This will ensure that t** will converge towards 
the longest vector orthogonal to response Y. still giving a good description of the 
variation in X. The data, X. can then be described as the score, t**, orthogonal to Y, 
10 times the corresponding loading vector p**, plus the unexplained part, the residual, E. 

X = t**p**' + E 

The residual, E, equals the filtered X, Xosc. after subtraction of the first component 
1 5 orthogonal to the response Y. 

E = X-t**p**' 
Xosc = E 

20 Jf more than one component needs to be removed, the same procedure is repeated 
using the residual, E, as the starting data matrix, X. 

New external data not present in the model calculation must be treated according to 
fittering of the modelling data. This is done by using the calculated weights, w, from the 
25 filtering to calculate a score vector, tiew. for the new data, Xnew 

tnew ^ Xnaw W 

By subtracting tnew times the loading vector from the calibration, p**, from the new 
30 external data, Xr^w, the residual, Enew, will be the resulting OSC filtered matrix for the 
new external data. 

Enew " Xnew " tne«r P*** 

35 tf PCA suggests separation between the classes under investigation, orthogonal signal 
correcHon (OSC) can be used to optimize the separation, thus improving the 
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performance of subsequent multivariate pattern recognition analysis and enhancing the 
predfctfve power of the model, (n the examples described herein, both PCA and PLS-DA 
analyses were improved by prior application of OSC. 

An example of a typical OSC process includes the following steps: 

(a) NMR data are segmented using AMIX, nomialised, and optionally scaled 
and/or mean centered. The default for orBiogonai filtering of spectral data is to use only 
mean centered data, which means that the mean for each variable (spectral bucket) is 
subtracted from each single variable in the data matrix. 

(b) a response vector (y) describing the dass separating variation is created by 
assigning class membership to each sample. 

(c) one latent variable orthogonal to the response vector (y) Is removed acconling 
to the OSC algorithm. 

(d) if desired, the removed orthogonal variation can be viewed and interpreted in 
temis of scores (T) and loadings (P). 

(e) the filtered data matrix, which contains less variation not conrelated to class 
separation. Is next used for further multivariate modelling after optional scaling and/or 
mean centering. 

Any particular model Is only as good as the data used to fomiulate ft Therefore, it is 
preferable that all modelling data and test data are obtained under the same (or similar) 
conditions and using the same (or similar) experimental parameters. Such conditions 
and parameters Include, for example, sample type (ag., plasma, serum), sample 
collection and handling protocol, sample dilution, NMR analysis (e.g., type, field 
strength/frequency, temperature), and data-processing (e.g., referencing, baseline 
conBcBon, nonnalisation). If appropriate, it may be desirable to fomiulate models for a 
particular sub-group of cases, e.g., according to any of the parameters mentioned at)Ove 
(e.g., field strength/frequency), or others, such as sex. age, ethnicity, medical history, 
lifestyle (e.g., smoker, nonsmoker), homional status (e.g., pre-menopausal. post- 
menopausal). 

In general, the quality of the model improves as the amount of modelling data Increases. 
Nonetheless, as shown in the examples below, even relatively small sets of modelling 
data (e.g., about 50-100 subjects) is sufficient to achieve a confident classification (e.g.. 
diagnosis). 
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A typical unsupervised modelling process includes the following steps: 

(a) opfionally scaling and/or mean centering modelling data; 

(b) classifying data (e.g., as control or positive, e.g., diseased); 

(c) fitting the model (e.g., using PCA, PLS-DA); 
5 (d) identifying and removing outliers, if any; 

(e) re-fitting the model; 

(f) optionally repeating (c), (d), and (e) as necessary. 

Optionally (and preferably), data filtering is perfonned following step (d) and before 
10 step (e). Optionally (and preferably), orthogonal filtering (e.g.. OSC) is perfonned 
following step (d) and before step (e). 

An example of a typical PLS-DA modelling process, using OSC filtered data, includes the 
following steps: 

1 5 (a) OSC filtered data is optionally scaled and/or mean centered. 

(b) a response vector (y) describing the class separating variation is created by 
assigning class membership to ail samples. 

(c) a PLS regression model is calculated between the OSC filtered data and the 
response vector (y). The calculated latent variables or PLS components will be focused 

20 on describing maximum separation between the known classes. 

(d) the model is interpreted by viewing scores (T). loadings (P), PLS weights (W), 
PLS coefficients (B) and residuals (E). Together they will function as a means for 
describing the separation between the classes as well as pro^^de an explanation to the 
observed separation. 

25 

Once the model has been calculated, it may be verified using data for samples of known 
dass which were not used to calculate the model. In this way. the ability of the model to 
accurately predict classes may be tested. This may be achieved, for example, in the 
method above, with the following additional step: 
30 (e) a set of external samples, with known class belonging, which were not used in 

the (e.g., PLS) model calculation is used for validation of the model's predictive ability. 
The prediction results are investigated, fore example, in temis of predicted response 
(ypced). predicted scores (Tpred). and predicted residuals described as predicted distance 
to model (DmodXp^d). 

35 
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The model may then be used to dassify test data, of unknown class. Before 
classification, the test data are numerically pre-processed in the same manner as the 
modelling data. 

Interpreting the output from the pattern recognition (PR) analysis provides useful 
infomiatlon on the biomarkers responsible for the separation of the biological classes. 
Of course, the PR output differs somewhat depending on the data analysis method used. 
As mentioned above, methods for PR and interpretation of the results are known in the 
art. Interpretation methods for two PR techniques (PCA and PLS-DA) are discussed 
briefly herein. 

Interpreting PCA Results 

The data matrix pC) is built up by N observations (samples, rats, patients, etc.) and K 
variables (spectral buckets carrying the biomari<er Infomiation In tenns of ^H-NMR 
resonances). 

In PCA, the N*K matrix QQ is decomposed into a few latent variables or principal 
components (PCs) describing the systematic variation in the data. Since PCA is a 
bilinear decomposition method, each PC can be divided into two vectora. scores (t) and 
loadings (p). The scores can be described as the projection of each observation on to 
each PC and tiie loadings as the contribution of each variable (spectral bucket) to the PC 
expressed in terms of direction. 

Any clustering of obsen/ations (samples) along a direction found in scores plots (e.g., 
PC1 versus PC2) can be explained by identifying which variables (spectral buckets) 
have high loadings for tiiis particular direction in the scores. A high loading is defined as 
a variable (specbal bucket) tfiat changes between tiie observations in a systematic way 
showing a trend which matches the sample posilfons in ttie scores plot. Each spectral 
bucket witti a high loading, or a combination ttiereof, is defined by its NMR chemical 
shW position; tills is Its diagnostic spectral window. These chemical shift values tiien 
allow the skilled NMR spectroscoplst to examine ttie original NMR spectra and Identify 
the molecules giving rise to tfie peaks In ttie relevant buckets; ttiese are ttie blomaricere. 
This is typically done using a combination of standard 1- and 2-dimensional NMR 
methods. 
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If, in a scores plot, separation of two classes of sample can be seen in a particular 
direction, then examination of those loadings which are in the same direction as in the 
scores plots indicates which loadings are important for the class identification. The 
loadings plot shows points which are labelled according to the bucket chemical shift. 
5 This is the NIWR spectroscopic chemical shift which con-esponds to the centre of the 
bucket. This bucket defines a diagnostic spectral window. Given a list of these bucket 
identifiers, the skilled NIWR spectroscopist then re-examines the NMR spectra and 
identifies, within the bucket width, which of several possible NMR resonances are 
changed between the two classes. The important resonance is characterised in terms of 
10 exact chemical shift, intensity, and peak multiplicity. Using other HMR experiments, 
such as 2-D NMR spectroscopy and/or separation of the specific molecule using 
HPLC-NMR-MS for example, other resonances from the same molecule are identified 
and ultimately, on the basis of all of the NMR data and other data if appropriate, an 
identification of the molecule (biomarker) Is made. 

15 

in a classification situation as described herein, one procedure for finding relevant 
biomarkers using PCA is as follows: 

(a) PCA of the data matrix (X) containing N observations belonging to either of two 
20 known classes (healthy or diseased). The description of the observations lies in the K 

variables (spectral buckets) containing the biomarker information in terms of NMR 
resonances. 

(b) Interpretation of the scores (t) to find ttie direction for the separation between the two 
25 known classes in X. 

(c) Interpretation of loadings (p) reveals which variables (spectral buckets) have the 
largest impact on the direction for separation described in the scores (t). This identifies 
the relevant diagnostic spectral windows. 

30 

(d) Assignment of the spectral buckets or combinations thereof to certain biomari^ers. 
This is done, for example, by interpretation of the resonances in NMR spectra and by 
using previously assigned spectra of the same type as a library for assignments. 
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Interpretino PLS>DA Results 

In PLS-DA, which is a regression extension of the PCA method, the options for 
fnterpretation are more extensive compared to the PCA case. PLS-DA perfomis a 
6 regression between the data matrix (X) and a "dummy matrix" (Y) containing the class 
membership information (e.g., samples may be assigned the value 1 for healthy and 2 
for diseased classes). The calculated PLS components will describe the maximum 
covariance between X and Y which in this case is the same as maximum separation 
between the known classes In X. The Interpretation of scores (t) and loadings (p) is the 
10 same in PLS-DA as in PCA. Interpretation of the PLS weights (w) for each component 
provides an explanation of the variables in X correlated to the variation in Y. This will 
give biomari<er infomiation for the separation between the classes. 

Since PLS-DA is a regression method, the features of regression coefficients (b) can 
15 also be used for discovery and interpretation of biomarkers. The regression coefficients 
(b) In PLS-DA provide a summary of which variables in X (spectral buckets) that are 
most important in temns of both describing variation in X and con-elating to Y. This means 
that variables (spectral buckets) wi»i high regression coefficients are important for 
separating the known classes In X since the Y matrix against which it is correlated only 
20 contains Information on the class idenfity of each sample. 

Again, as discussed above, the scores plot is examined to identify Important loadings, 
diagnostic spectral windows, relevant NMR resonances, and ultimately the assodated 
biomaricers. 

25 

In a classification situation as described herein, one procedure for finding relevant 
biomaricers using PLS-DA is as follows: 

(a) A PLS model between the N*K data matrix pC) against a "dummy matrix** Y, 

30 containing Infomiation on class membership for the obsen/ations In X, is calculated 
yleWing a few latent variables (PLS components) describing maximum separation 
between the two classes In X (e.g.. healthy and diseased). 

(b) Interpretation of the scores (t) to find the direction for the separation between tiie two 
35 known classes in X. 



wo 02/086501 



PCT/GB02/01862 



-81- 

(c) Interpretation of loadings (p) revealing which variables (spectral buckets) have the 
largest impact on the direction for separation described in the scores (t); these are 
diagnostic spectral windows. 

5 In PLS-DA, a variable importance plot (VIP) is another method of evaluating the 
significance of loadings in causing a separation of class of sample in a scores plot 
Typically, the VIP is a squared function of PLS weights, and therefore only positive 
numerical values are encountered; in addition, for a given model, there is only one set of 
VIP-values. Variables with a VIP value of greater than 1 are considered most influential 
10 for the model. The VIP shows each loading in a decreasing order of importance for class 
separation based on the PLS regression against class varial^le. 

A (w*c) plot Is another diagnostic plot obtained from a PLS-DA analysis. It shows which 
descriptors are mainly responsible for dass separation. The (w^c) parameters are an 

1 5 attempt to describe the total variable conelations in the model, i.e., between the 
descriptors (e.g., NMR intensities in buckets), between the NMR descriptors and the 
dass variables, and between class variables if they exist (in the present two class case, 
where samples are assigned by definition to class 1 and class 2 there fs no correlation). 
Thus for a situation in a scores plot (e.g., t1 vs. t2), if class 1 samples are clustered in 

20 the upper right hand quadrant and class 2 samples are clustered in the lower left-hand 
quadrant, then the (w*c) plot will show descriptors also in these quadrants. Descriptors in 
the upper right hand quadrant are Increased in dass 1 compared to class 2 and vice 
versa for the lower left hand quadrant. 

25 (d) Interpretation of PLS weights (w) reveals which variables (spectral buckets) in X are 
important for conflation to Y (class separation); these, too, are diagnostic spectral 
windows. 

(e) Interpretation of the PLS regression coefficients (b) reveals an overall summary of 
30 which variables (spectral buckets) have the largest impact on the direction for separation 
described in the scores; these, too, are diagnostic spectral windows. 

In a typical regression coeffident plot for NMR, each bar represents a spectral region 
(e.g., 0.04 ppm) and shows how the NMR profile of one class of samples differs from 
35 the NMR profile of a second class of samples. A positive value on the x-axis 

indicates there is a relatively greater concentration of metabolite (assigned using NMR 
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chemical shift assignment tables) in one class as compared to the other class, and a 
negative value on the x-axis indicates a relatively lower concentration in one class as 
compared to the other dass. 

(f) Assignment of the spectral buckete or combinations thereof to certain biomarkers. 
This is done, for example, by interpretaHon of the resonances in NMR spectra and by 
using previously assigned spectra of the same type as a library for assignments. 

Timed Sampiina 

The analysis methods described herein can be applied to a single sample, or 
alternatively, to a timed series of samples. These samples may be taken relatively close 
together in time (e.g.. daily) or less frequently (e.g., monthly or yearly). 

The timed series of samples may be used for one or more purposes, e.g., to make 
sequential diagnoses, applying the same classification method as if each sample were a 
single sample. This will allow greater confidence in the diagnosis compared to obtaining 
a single sample for the patient, or attematively to monitor temporal changes in the 
sut^ect (e.g., changes In the underiying condition being diagnosed, treated, etc.). 

Attematively, the timed series of samples can be collectively treated as a single dataset 
increasing the infonnation density of the input dataset and hence increasing the power of 
the analysis method to identify weaker patterns. 

As yet another alternative, the timed series of samples can be collectively processed to 
yield a single dataset in which the temporal changes (e.g., in each bin) is included as an 
extra list of variables (e.g.. as in composite data sets). Temporal changes in the amount 
of (e.g., endogenous) diagnostic species may greafly improve the ability of the analysis 
method to accurate classify patterns (especially when pattems are weak). 

Batch Modelling 

The methods described herein, including their applications (e.g., diagnosis, prognosis), 
may be further improved by employing batch modelling. 



wo 02/086501 



PCT/GB02/01862 



-83- 

Statistical batch processing can be divided into two leveis of multivariate modeliing. Tlie 
lower or the observation level is usually based on Partial Least Squares (PLS) 
regression against time (or any other index describing process maturity), whereas the 
upper or batch level consists of a RCA based on the scores from the lower level PLS 
5 model. PLS can also be used in the upper level to con-elate the matrix based on the 
lower level scores with the end properties of the separate batches. This is common in 
industrial applications where properties of the end product are used as a description of 
quality. 

10 At the lower level of the Batch modeliing the evolution of the studied process with time 
(maturity) can be monitored and Interpreted in tenns of PLS scores and loadings. Since 
the PLS perfomns a regression against sampling time (maturity), the calculated 
components will be focused on the evolution with time. The fact that the calculated PLS 
components are orthogonal to each other means that It is possible to detect independent 

15 time (maturity) profiles and also to Interpret which measured variables are causing these 
profiles. Confidence limits are used for detection of deviating behaviour of any spectra at 
any time point for some optional significance level, usually 95% and/or 99%. 

The residuals expressed as distance to model (DModX) is, at the lower level, another 
20 important tool for detecting outlying batches or deviating behaviour for a specific batch at 
a specific time point. The upper level or batch level provides the possibility to just look at 
the difference between the separate batches. This is done by using the lower level 
scores Including all time points for each batch as new variables describing each single 
batch and then perfonming a PCA on this new data matrix. The features of scores, 
25 loadings and DmodX are used in the same way as for ordinary PCA analysis, with the 
exception that the upper level loadings can be traced back down to the lower level for a 
more detailed explanation in the original loadings. 

Predictions for "new** batches can be done on both levels of the batch model. On the 
30 lower level monitoring of evolution with time using scores and DmodX is a powerful tool 
for detecting deviating behaviour from normality for batch at any time point. On the 
upper level prediction of single batch behaviour can be done in terms of scores and 
DmodX. 

35 The definition of a batch process, and also a requirement for batch modelling, Is a 
process where all batches have equal duration and are synchronised according to 
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10 

5 e g., as wnnal or abnomal. This may be achieved a pls 

See. forexample, Wold et aU998bancl ErtkMon at at. 1999. 
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Intearated Metabonomics 

As discussed above, many of the methods of the present invention may also be applied 
to composite data or composite data sets. The term "composite data set," as used 
5 herein, pertains to a spectrum (or data vector) which comprises spectral data (e.g., NMR 
spectral data, e.g., an NMR spectrum) as well as at least one other datum or data vector. 
Examples of other data vectors include, e.g., one or more other NMR spectral data, e.g., 
NMR spectra, e.g., obtained for the same sample using a different NMR technique; other 
types of spectra, e.g., mass spectra, numerical representations of images, etc.; obtained 
10 for the another sample, of the same sample type (e.g., blood, urine, tissue, tissue 
extract), but obtained from the subject at a different timepoint; obtained for another 
sample of different sample type (e.g., blood, urine, tissue, tissue extract) for the same 
subject; and the like. 

15 Examples of other data including, e.g.. one or more clinical parameters. Clinical 

parameters which are suitable for use in composite methods include, but are not limited 
to, the following: 

(a) established clinical parameters routinely measured in hospital clincal labs: age; sex; 
20 body mass ind^ height; weight; family history; medication history; cigarette smoking; 

alcohol intake; blood pressure; full blood cell count (FBCs); red blood cells; white blood 
cells; monocytes; lymphocytes; neutrophils; eosinophils; basophils; platelets; 
haematocrit; haemoglobin; mean corpuscular volume and related haemodilution 
indicators; fibrinogen; functional clotting parameters (thromoboplastin and partial 
25 thromboplastin); electrolytes (sodium, potassium, calcium, phosphate); urea; creatinine; 
total protein; albumin; globulin; bilirubin; protein markers of liver function (alanine 
aminotransferase, alkaline phosphatase, gamma glutamyl transferase); glucose; Hba1c 
(a measure of glucose-Haemoglobin conjugates used to monitor diabetes); lipoprotein 
profile; total cholesterol; LDL; HDL; triglycerides; blood group. 

30 

(b) established research parameters routinely measured in research laboratories but not 
usually measured in hospitals: hormonal status; testosterone; estrogen; progesterone; 
follicle stimulating honnone; inhibin; transfomiing growth factor-betal; Transfomning 
growth factor-beta2; chemoMnes; MCP-1; eotaxin; plasminogen activator inhibitor-1; 

35 cystatin C. 
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(c) early-stage research parameters measured in one or a small number of specialist 
labs: antibodies to sRII; antibodies to blood group A antigen; antibodies to blood group B 
antigen; Immunoglobulin (IgD) against alpha-gal; Immunoglobulin (IgD) against penta- 
gal. 

Diagnostic Spectral Windows 

As discussed above, many of the methods of the present Invention Involve relating NMR 
spectral Intensity at one or more predetermined diagnostic spectral windows with a 
predetemiined condition. 

Examples of methods for identifying one or mora suitable diagnostic spectral windows for 
a given condition, using, for example, pattern recognition methods, are described herein. 

The term "diagnostic spectral window," as used herein, pertains to nan-ow range of 
chemical shift (A6) values encompassing an index value, 6r (that is. 6r falls within the 
range A6). Each index value, and its associated spectral window, define a range of 
chemical shift (A5) in which the NMR spectral Intensity is Indicative of the presence of 
one or more chemical species. 

For 2D NMR methods, the diagnostic spectral window referc to a chemical shift patch 
(A5i, A62) which encompasses an Index value, [Sn, SrJ. For 3D NMR methods, the 
diagnostic spectral window refers to a chemical shift volume (A6,, A62, ABa) which 
encompasses an index value, [6ri, 612, 5^]. 

In one embodiment, the spectral window is centred with respect to its index value (e.g., 
6r = 1.30; IA5I = 6 0.04. and A6 1.28-1.32). 

The breadth of the range, >iSl is detemilned largely by the spectroscopic parameters, 
such as field strengtWfrequency, temperature, sample viscosity, etc. The breadth of the 
range is often chosen to encompass a typical spin-coupled muftlplet pattern. For peaks 
whose position varies with sample pH, the breadth of the range Is may be widened to 
encompass the expected range of positions. 

Typically, the breadth of the range, |AJ5|, is from about 5 0.001 to about 5 0.2. 
In one embodiment, the breadth is from about 5 0.005 to about 6 0.1. 
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In one embodiment, the breadth is from about 5 0.005 to about 5 0.08. 
In one embodiment, the breadth is from about 5 0.01 to about 5 0.08. 
In one embodiment, the breadth is from about 5 0.02 to about 6 0.08. 
In one embodiment, the breadth is from about 5 0.005 to about 6 0,06. 
5 In one embodiment, the breadth is from about 6 0.01 to about 5 0.06. 
In one embodiment, the breadth is from about 5 0.02 to about 5 0.06. 
In one embodiment, the breadth is about 5 0.04. 

In one embodiment, the breadth Is equal to the "bucket" or "bin" width. In one 
10 embodiment, the breadth is equal to an integer multiple of the "bucket" or "bin" width. 

Although the diagnostic spectral windows are determined in relation to the condition 
under study, the precise index values for such windows may vary in accordance with the 
experimental parameters employed, for example, the digital resolution in the original 
1 5 spectra, the width of the buckets used, the temperature of the spectral data acquisition, 
etc. The exact composition of the sample (e.g., biofluid, tissue, etc.) can affect peak 
positions by compartmentation, metal compiexation. protein-small molecule binding, etc. 
The observation frequency will have an effect because of different degrees of peak 
overlap and of first/second order nature of spectra. 

20 

In one embodiment, said one or more predetemnined diagnostic spectral windows is: a 
single predetermined diagnostic spectral window. 

In one embodiment, said one or more predetermined diagnostic spectral windows is: a 
25 plurality of predetemnined diagnostic spectral windows. In practice, this may be 
preferred. 

Although the theoretical limit on the number of predetermined diagnostic spectral 
windows is a function of the data density (e.g.. the number of variables, e.g., buckets), 
30 typically the number of predetermined diagnostic spectral windows is from 1 to about 30. 
It is possible for the actual number to be in any sub-range within these general limits. 
Examples of lower limits include 1, 2, 3, 4. 5. 6. 8, 10, and 15. Examples of upper limits 
include 3, 4, 5. 6, 8. 10. 15, 20. 25. and 30. 



35 



In one embodiment, the number is from 1 to about 20. 
in one embodiment, the number is from 1 to about 15. 
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In one embodiment, the number is from 1 to about 10. 
In one embodiment, the number is from 1 to about 8. 
In one embodiment, the number is from 1 to about 6. 
f n one embodiment, the number is from 1 to about 5. 
5 In one embodiment, the number is from 1 to about 4. 
In one embodiment, the number is from 1 to about 3. 
In one embodiment, the number is 1 or 2. 

In one embodiment, said one or more predetemilned diagnostic spectral windows is: a 
10 plurality of diagnostic spectral windows; and, said NMR spectral intensity at one or more 
predetermined diagnostic spectral windows is: a combination of a plurality of NMR 
spectral Intensities, each of which is NMR spectral intensity for one of said plurality of 
predetermined diagnostic spectral windows. 

15 In one embodiment said combination is a linear combination. 

In one embodiment, at least one of said one or more predetemiined diagnostic spectral 
windows encompasses a chemical shift value for an NMR resonance of a diagnostic 
species (e-g., a NMR resonance of a diagnostic species). 

20 

In one embodiment, each of a plurality of said one or more predetemiined diagnostic 
spectral windows encompasses a chemical shift value for an NMR resonance of a 
diagnostic species (e.g.. a *H NMR resonance of a diagnostic species). 

25 In one embodiment, each of said one or more predetermined diagnostic spectral 
windows encompasses a chemical shift value for an NMR resonance of a diagnostic 
species (e.g., a NMR resonance of a diagnostic species). 

Diagnostic Soectral Windows - Osteoporosis 

30 

It is believed that the index values, and ttie assodated diagnostic spectral windows, 
primarily reflect the spedes described In Table 1-OP and/or Table 2-OP. 

In one embodiment, said predetemiined diagnostic sperfral windows are defined by one 
35 or more index values, 6r, conesponding to ttie bucket regions listed In Table 10P and/or 
Table 2-OP. 
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In one embodiment, said predetemiined diagnostic spectral windows are defined by one 
or more index values, 5^ corresponding to the bucket regions listed in Table 1-OP and/or 
Table 2-OP, and breadth of the range value, |&S| about 0.04. 

5 

In one embodiment, said predetermined diagnostic spectral windows are defined by one 
or more index values, 6r. corresponding to the bucket regions listed in Table 1-OP and/or 
Table 2-OP, and which are detennined using the conditions set forth in the section 
entitled "NIWR Experimental Parameters." 

10 

In one embodiment, at least one of said one or more predetermined diagnostic spectral 
windows encompasses a chemical shift value for an NMR resonance of fifee proline 
(e.g., a NMR resonance of free proline). 

15 In one embodiment, each of a plurality of said one or more predetennined diagnostic 
spectral windows encompasses a chemical shift value for an NMR resonance of free 
proline (e.g., a NMR resonance of finee proline). 

In one embodiment, each of said one or more predetemiined diagnostic spectral 
20 windows encompasses a chemical shift value for an NMR resonance of free proline 
(ag., a NMR resonance of free proline). 

The NMR chemical shifts for fifee proline In add, neutral, and basic aqueous solution, 
are shown below. Note that each proton of the CH2 groups should have a distinct 
25 NMR chemical shift, because of the presence of the chiral centre. These are resolved for 
the p- and 6-CH2 groups (I.e., p-CHz and p'-CHi 6-CH2 and ff-CHz); but not for the y- 
CHz group. See, for example. Fan, 1996. 
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'H NMR Chemical Shifts for Free Proline 

H 

proline 


Proton 


6H (acid) 


5H (neutral) 


5H (basic) 


Multiplicity 


a-CH 


4.45 


4.14 


3.46 


triplet 


p-CHz 


2.41 


2.36 


2.12 


multiplet 


P'-CHz 


2.17 


2.08 


1.72 


multiplet 


Y-CH2 


2.06 


2.01 


1.72 


multiplet 


6-CH2 


3.42 


3.40 


2.74 


triplet ~ 


5'-CH2 


3.42 


3.33 


3.02 


triplet 



Diagnostic S pecies and Biomarkers 



The Index values, and the associated diagnostic spectral windows, define ranges of 
chemical shift in which NMR spectral intensity Is Indicative of the presence of one or 
more chemical species, one or more of which are diagnostic species (e.g., biomarkers), 
for example, for a condition (e.g., indication) under study. 

In one embodiment, said one or more diagnostic species are endogenous diagnostic 
species. 



In one embodiment, said one or more diagnostic species are associated with NMR 
spectral intensi^ at predetemnined diagnostic spectral windows. 

In one embodiment, said one or more diagnostic species are a plurality of diagnostic 
species (Le., a comblnatfon of diagnostic species). 

In one embodiment, said one or more diagnostic species is a single diagnostic species. 

The temfj "endogenous species," as used herein, pertains to chemical species which 
originated from the subject under study, for example, which were present in the sample 
ofthesuty'ecL 
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Once an index value, and Its associated diagnostic spectral window, is identified (e.g.. by 
the application of modelling methods as described herein), it Is otten possible to identify 
one or more putative biomarkers which give rise to NMR spectral intensity in that 
particular window. 

5 

The (e.g., integrated) NMR spectral intensity in a particular spectral window 
(e.g., bucket) is the sum of the spectral intensity for all of the NMR peaks in that window. 
Usually for small molecules which give sharp NMR peaks, it is possible to examine the 
raw NMR data and detennine which of the peaks is responsible for that particular 
10 spectral window being selected as significant by the applied pattern recognition method. 
The relevant peak(s) are then assigned. 

Such assignments may be made, for example, by reference to published data; by 
comparison with spectra of authentic materials; by standard addition of an authentic 
15 reference standard to the sample; by separating the Individual component, e.g.. by using 
HPLC-NMR and identifying it using NMR and mass spectrometry. Additional 
confirmation of assignments is usually sought from the application of other NMR 
methods, including, for example, 2-dimensional (2D) NMR methods. 

20 In another approach, concentrations of candidate chemical species are measured by 
another specific method (e.g.. ELISA, chromatography, RIA. eta) and compared with the 
spectral intensity observed in the relevant diagnostic spectral window, and any 
correlation noted. This will reveal how much of the variance In the diagnostic spectral 
window is contributed by the candidate chemical species. This may also reveal that 

25 suspected diagnostic species are, In fact, not highly correlated with the condition under 
examination. 

Mettiods of Identifying Diaonostic Soecies 

30 Thus, the methods described herein also facilitate the identification of species (often 
referred to as biomaricers or diagnostic species) which are indicative (e.g., diagnostic) of 
a particular condition. For example, particular metabolites (e.g., in blood, urine, etc.) 
may be diagnostic of a particular condition. 

35 One aspect of the present invention pertains to a method of identifying such diagnostic 
species (e.g., biomaricers), as described herein. 



wo 02/086501 



-92- 



PCT/GB02/01862 



conaioon, said method compnsing the steps of: 

^—"^ * .h. p™«,ce ..r^ ^ 

predetermined condition- 
15 predetemninedcondlUon; 



and: 



20 



25 



(b) identifying one or more criBcar experimental parameters- 

diffecBnitrr^'^r''''"'^''^'''^'^^^^ 



on 



0>) «^-<m<3 a combl,««o„ « a p)u««), of «nica. experinenbl paran,ele,5 
«9nflca«for<»scnm,„a«„gbe««».tos=,<,,aa«ch.s8™p;a„<, ^ 
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In one embodiment, one or more of said critlcai experimental parameters is a spectral 
parameter (i.e„ a critical experimental spectral parameter); and said identifying and 
matching steps are: 

(b) identifying one or more critical experimental spectral parameters; and, 
5 (c) matching each of one or more of said one or more critical experimental 

spectral parameters with a spectral feature, e.g., a spectral peak; and 

matching one or more of said spectral peaks with said diagnostic species; 

or: 

10 

(b) identifying a combination of a plurality of critical experimental spectral 
parameters; and, 

(c) matching each of a plurality of said plurality of critical experimental spectral 
parameters with a spectral feature, ag., a spectral peak; and 

1 5 matching one or more of said spectral peaks with said combination of a plurality 

of diagnostic spedes. 

In one embodiment, said multivariate statistical analysis method is a multivariate 
statistical analysis method which employs a pattern recognition method. 

20 

In one embodiment said multivariate statistical analysis method Is, or employs PCA. 

In one embodiment, said multivariate statistical analysis method is, or employs PLS. 

25 In one emt)odiment, said multivariate statistical analysis method is, or employs PLS-DA. 

In one embodiment, said multivariate statistical analysis method includes a step of data 
filtering. 

30 in one embodiment, said multivariate statistical analysis method includes a step of 
orthogonal data filtering. 

In one embodiment, said multivariate statistical analysis method includes a step of OSC. 
35 In one embodiment, said experimental parameters comprise spectral data. 
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in one embodiment, safd experimental parameters comprise both spectral data and 
non.pect.,data(andls referred to asa-compos^eexperimen^^^^^^ 

^ «n one embodiment, said experimental parameters comprise NMR spectral data. 

ro:::M;t:::rdr^^^^^^^^^ 
10 ::rs;r:r^"""""^""~^ 

in one embodiment, said NMR spectral data comprises 'H NMR spectral data. 
«n one embodiment, said non-spectral data is non-spectral clinical data, 
•n one embodiment, said non-NMR spectra, data is non-spectral Cinical data. 

in one embodiment said<.tica,experimenta,paramete. are spec^lparamet^ 

20 in one embodiment, said class group comprises dasses associated with said 
predetem^ined condition (e.g.. prasence. absence, degrae. etc.,. 

»n one embodiment, said dass g«,up comprises exactly two dasses. 

IZterT"' '""^ ^""^ ^-^-^ P-sence of said 

PTBdeterniined condition; and absence of said pradetem,ined cond Jn. 

assoaated wrth the presence of said predetemiined condition. 



30 



"tes Ms«atedw* tt„abs««» Of «klp«detam>l™d condBoa 
35 W)c=nfim,mga«wertiiyrfsakldla9nortlo^»cles. 
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One aspect of the present invention pertain to novel diagnostic species (e.g., biomarker) 
which are identified by such a method. 

One aspect of the present invention pertains to one or more diagnostic spedes 
5 (e.g., biomarkers) which are identified by such a method for use in a method of 
classifjcatlon (e.g., diagnosis). 

One aspect of the present invention pertains to a method of classification 
(e.g., diagnosis) which employs or relies upon one or more diagnostic species 
10 (e.g., biomarkers) which are identified by such a method. 

One aspect of the present invention pertains to use of one or more diagnostic species 
(e.g., biomarkers) which are identified by such a method in a method of classification 
(e.g., diagnosis). 

15 

One aspect of the present invention pertains to an assay for use in a method of 
classification (e.g., diagnosis), which assay relies upon one or more diagnostic species 
(e.g., biomarkers) which are identified by such a method. 

20 .One aspect of the present invention pertains to use of an assay in a method of 

classification (e.g., diagnosis), which assay relies upon one or more diagnostic species 
(e.g., biomarkers) which are identified by such a method. 

Diagnostic Species - Osteoporosis 

25 

In one embodiment, at least one of said one or more predetermined diagnostic species is 
a species described in Table 1-OP and/or Table 2-OP. 

In one embodiment, each of a plurality of said one or more predetermined diagnostic 
30 species is a species described in Table 1-OP and/or Table 2-OP. 

In one emt}odiment, each of said one or more predetermined diagnostic species is a 
species described in Table 1-OP and/or Table 2-OP. 
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As discussed above, many of the methods of the present invention involve classification 
on the basis of an amount, or a relative amount, of one or more diagnostic species. 

In one embodiment, said classification is perfbmied on the basis of an amount, or a 
relative amount, of a single diagnostic species. 

In one embodiment, said classification is performed on the basis of an amount, or a 
relative amount, of a plurality of diagnostic species. 

In one embodiment, said classification is performed on the basis of an amount, or a 
relative amount, of each of a plurality of diagnostic spedes. 

In one embodiment, said classification is performed on the basis of a total amount, or a 
relative total amount, of a plurality of diagnostic species. 

In one embodiment (wherein said one or more diagnostic spedes is: a plurality of 
diagnostic spedes), said amount of. or relative amount of one or more diagnostic 
spedes Is: a combination of a plurality of amounts, or relative amounts, each of which is 
the amount of. or relative amount of one of said plurality of diagnostic spedes. 

In one embodiment, said combination is a linear combination. 

The term "amount," as used in this context, pertains to ttie amount regardless of the 
terms of expression. 

The tenn "amount," as used herein in tiie context of " amount of, or relative amount of 
(e.g., diagnostic) spedes," pertains to the amount regardless of Hie temis of expression. 

Absolute amounts may be expressed, for example, in terms of mass (e.g.. jig), moles 
(e.g., ijmol). volume O-e., pL), concentration (molarity, pg/mL, pg/g, wt%, vol%, etc.), etc. 

Relative amounts may be expressed, for example, as ratios of absolute amounts (e.g.. 
as a fraction, as a multiple, as a %) with respect to anotiier chemical spedes. For 
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example, the amount may expressed as a relative amount, relative to an internal 
standard, for example, another chemical species which is endogenous or added. 

The amount may be indicated indirectly, in temris of another quantity (possibly a 
5 precursor quantity) which is indicative of the amount. For example, the other quantity 
may be a spectrometric or spectroscopic quantity (e.g., signal, intensity, absorbance, 
transmittance, extinction coefficient, conductivity, etc.; optionally processed, e.g., 
integrated) which itself indicative of the amount. 

10 The amount may be Indicated, directly or indirectly, in regard to a different chemical 
species (e.g., a metabolic precursor, a metabolic product, etc.), which is Indicative the 
amount. 

Diagnostic Shift 

15 

As discussed above, many of the methods of the present invention involve classification 
on the basis of a modulation, e.g., of NMR spectral intensity at one or more 
predetermined diagnostic spectral windows; of the amount, or a relative amount, of 
diagnostic species; etc. In this context, "modulation" pertains to a change, and may be, 
20 for example, an increase or a decrease. In one embodiment, said "a modulation of is 
"an Increase or decrease in." 

in one embodiment, the modulation (e.g., Increase, decrease) is at least 10%, as 
compared to a suitable control. In one embodiment, the modulation (e.g., increase, 
25 decrease) is at least 20%, as compared to a suitable control. In one embodiment, the 
modulation is a decrease of at least 50% (i.e., a factor of 0.5). In one embodiment, the 
modulation Is a increase of at least 100% (i.e., a factor of 2). 

Each of a plurality of predetennined diagnostic spectral windows, and each of a plurality 
30 of diagnostic species, may have independent modulations, which may be the same or 
different. For example, if there are two predetemiined diagnostic spectral windows, 
NMR spectral intensity may increase in one window and decrease in the other window. 
In this way, combinations of modulations of NMR spectral Intensity In different diagnostic 
spectral windows may be diagnostic. Similarly, if there are two diagnostic species, the 
35 amount of one may increase, and the amount of the other may decrease. Again, 
combinations of modulations of amounts, or relative amounts of, different diagnostic 
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species may be diagnostic. See, for example, the data In the Exaniples below, which 
illustrate cases where different species have different modulations. 

The term "diagnostic shift," as used herein, pertains a modulation (e.g.. increase, 
decrease), as compared to a suitable control. 

A diagnostic shift may be In regard to, for example, NMR spectral intensity at one or 
more predetermined diagnostic spectral windows; or the amount of, or relative amount 
of, diagnostic species. 

Control Samoles. Control Subjects. Control Data 

Suitable controls are usually selected on the basis of the organism (e.g., subject, patient) 
under study (test subject, study subject, etc.), and the nature of the study (e.g., type of 
sample, type of spectra, etc.). Usually, controls are selected to represent the state of 
"normality." As described herein, deviations from nonnality (e.g., higher than normal, 
lower than normal) in test data, test samples, test subjects, etc. are used in classification, 
diagnosis, etc. 

For example, In most cases, control subjects are the same species as the test subject 
and are chosen to be representative of the equivalent nonnal (e.g.. healthy) organism. A 
control population is a population of control subjects. If appropriate, control subjects may 
have characteristics in common (e.g., sex. ethnicity, age group, etc.) with the test 
subject If appropriate, control subjects may have characteristics (e.g., age group, etc.) 
which differ from those of the test subject For example, it may be desirable to choose 
healthy 20-year olds of the same sex and ethnicity as the study subject as control 
subjects. 

In most cases, control samples are taken from control subtjects. Usually, control samples 
are of the same sample type (e.g., serum), and are collected and handled (e.g., treated, 
processed, stored) under the same or similar conditions, as the sample under study 
(e.g., test sample, study sample). 

In most cases, control data (e.g.. control values) are obtained from control samples 
which are taken from control subjects. Usually, control data (e.g., control data sets, 
contix)l spectral data, control spectra, eta) are of the same type (e.g.. 1-D NMR, etc.). 
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and are collected and handled (e.g., recorded, processed) under the same or similar 
conditions (e.g., parameters), as the test data. 



Implementation 

5 

The methods of the present invention, or parts thereof, may be conveniently perfonned 
electronically, for example, using a suitably programmed computer system. 



One aspect of the present invention pertains to a computer system or device, such as a 
1 0 computer or linked computers, operatively configured to implement a method of the 
present invention, as described herein. 

One aspect of the present invention pertains to computer code suitable for implementing 
a method of the present invention, as described herein, on a suitable computer system. 

15 

One aspect of the present invention pertains to a computer program comprising 
computer program means adapted to perform a method according to the present 
invention, as described herein, when said program is run on a computer. 

20 One aspect of the present invention pertains to a computer program, as described 
above, embodied on a computer readable medium. 

One aspect of the present invention pertains to a data canier which cam'es computer 
code suitable for implementing a method of the present Invention, as described herein, 
25 on a suitable computer. 



In one embodiment, the above-mentioned computer code or computer program includes, 
or is accompanied by, computer code and/or computer readable data representing a 
predictive mathematical model, as described herein. 

30 

In one emt)odiment, the above-mentioned computer code or computer program includes, 
or is accompanied by, computer code and/or computer readable data representing data 
from which a predictive mathematical model, as described herein, may be calculated. 

35 One aspect of the present Invention pertains to computer code and/or computer readable 
data representing a predictive mathematical model, as described herein. 
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One aspect of the present invention pertains to a data canier which carries computer 
code and/or computer readable data representing a predicHve mathematical model, as 
described herein. 

One aspect of the present invention pertains to a computer system or device, such as a 
computer or linked computers, programmed or loaded with computer code and/or 
computer readable data representing a predictive mathematical model, as described 
herein. 

Computers may be linked, for example, internally (e.g.. on the same circuit boarel. on 
different drcurt boards which are part of the same unit), by cabHng (e.g., networking, 
ethemet, internet), using wireless technology (e.g., radio, microwave, satellite link, cell- 
phone), etc., or by a comfcrination thereof. 

Examples of data carriers and computer readable media include chip media (e.g., ROM. 
RAM. flash memory (e.g., Memory Stick™. Compact Flash™, Smartmedia™). magnetic 
disk media (e.g., floppy disks, hard drives), optical disk media (e.g., compact disks 
(CIDs). digital versatile disks (DVDs), magneto-opBcal (MO) disks), and magnetic tape 
media. 

Although the 'H-NMR spectra analysed here were generated using a conventional (and 
hence large and expensive) 600 MHz NMR spectrometer, on-going technofogical 
advances suggest that spectrometers of similar resolving power may soon be available 
as desktop units (provided the sample to be analyzed is small, as is the case with 
plasma or serum samples). Such units, together with a personal computer to perform 
automated pattern recognition, may soon be available not only In large hospitals but also 
in the primary healthcare milieu. 

One aspect of the present invention pertains to a system (e.g., an "integrated analyser", 
"diagnostic apparatus") which cbmprises: 

(a) a first component comprising a device for obtaining NMR spectral intensity 
data fDr a sample (e.g.. a NMR spectrometer. e.g., a Bruker iNCA 500 MHz); and. 

(b) a second component comprising computer system or 6evice, such as a 
computer or linked computers, operatively configured to implement a method of the 
present invention, as described herein, and operatively linked to said first component 
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In one embodiment, the first and second components are in close proximity, e.g., so as 
to fomi a single console, unit, system, etc. In one embodiment, the first and second 
components are remote (e.g., in separate rooms, in separate buildings). 

5 

A simple process for the use of such a system is described below. 

In a first step, a sample (e.g., blood, urine, etc.) is obtained from a subject, for example, 
by a suitably qualified medical technician, nurse, etc., and the sample is processed as 
10 required. For example, a blood sample may be drawn, and subsequently processed to 
yield a serum sample, within about three hours. 

In a second step, the sample Is appropriately processed (e.g.. by dilution, as described 
herein), and an NMR spectmm is obtained for the sample, for example, by a suitably 
1 5 qualified NMR technician. Typically, this would require about fifteen minutes. 

In a third step, the NMR spectrum is analysed and/or classified using a method of the 
present invention, as described herein. This may be performed, for example, using a 
computer system or device, such as a computer or linked computers, operatively 

20 configured to implement the methods described herein. In one embodiment, this step is 
performed at a location remote from the previous step. For example, an NMR 
spectrometer located in a hospital or clinic may be linked, for example, by ethemet, 
internet or wireless connection, to a remote computer which peri^onns the 
analysis/classification. If appropriate, the result is then fonrarded to the appropriate 

25 destination, e.g., the attending physician. Typically, this would require about fifteen 
minutes. 

APDiications 

30 The methods described herein can be used in the analysis of chemical, biochemical, and 
biological data. 

The methods described herein provide powerful means for the diagnosis and prognosis 
of disease, for assisting medical practitioners in providing optimum therapy for disease, 
35 and for understanding the benefits and side-effects of xenobiotic compounds thereby 
aiding the drug development process. 
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FurthennorB, the methods described herein can be applied in a non-medical setting, 
such as in post mortem examinations, forensic science, and the analysis of complex 
chemical mixtures other than mammalian ceils or biofiuids. 

Examples of these and other applications of the methods described herein Include, but 
are not limited to, the following: 

Medical Diaonostic Applications 

(a) Eariy detection of abnormality/problem. For example, the technique can be used to 
identify subjects suffering from cerebral edema immediately on arrival in the acute 
emergency department of a hospital. At present, when patients present with head 
trauma, it is difficult to tell whether cerebral edema will be a problem: as a result, it may 
not be possible to intervene until clinical symptoms of cerebral edema become evident, 
which may be too late to save the patient. 

In a similar example, patients amVing at acute emergency departments can be screened 
for internal bleeding and organ rupture, to facilitate eariy surgical intervention. 

In a third example, the methods described herein can be used to identify a clinically 
silent disease (e.g., low bone mineral density (e.g., osteoporosis); infection with 
Helicobacter Py^on) prior to the onset of clinical symptoms (e.g., fracture; development of 
ulcers). 

(b) Diagnosis (identification of disease), especially cheap, rapid, and non-Invasive 
diagnosis. For example, the methods described herein can be used to replace treadmill 
exercise tests, echiocardiograms, electrocardiograms, and invasive angiography as the 
collective method for the identification of coronary heart disease. Since the cunent tests 
for coronary heart disease are slow, expensive, and invasive (with assodated morbidity 
and mortality), the methods described herein offer significant advantages. 

(c) Differential diagnosis, e.g., classification of disease, severity of disease, etc., for 
example, the ability to distinguish patients with coronary artery disease affecting 1,2. or 
all 3 coronary arteries (see example below); the ability to distinguish disease at different 
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anatomical sites, e.g., tn the left coronary artery versus the circumflex artery, or in the 
carotid arteries as opposed to the coronary arteries. 

(d) Population targeting. A condition (e.g., coronary heart disease, osteoporosis) may be 
5 clinically silent for many years prior to an acute event (e.g., heart attack, bone fracture), 

which may have significant associated morbidity or mortality. Drugs may exist to help 
prevent the acute event (e.g., statins for heart disease, bisphosphonates for 
osteoporosis), but often they cannot be efficiently targeted at the population level. The 
requirements for a test to be useful for population screening are that they must be cheap 
10 and non-Invasive. The methods described herein are Ideally suited to population 

screening. Screens for multiple diseases with a single blood sample (e.g., osteoporosis, 
heart disease, and cancer) further Improve the cost/benefit ratio for screening. 

(e) Classtflcalion, fingerprinting, and diagnosis of metabolic diseases (e.g., inborn emors 
15 of metabolism). 

(f) Identifying, classifying, detemnining the progress of. and monitoring the treatment of. 
infectious diseases. 

20 (g) Characterization and Wentification of drugs used in ovenJose. For example, a patient 
may be unconscious following an overdose and/or the nature of the drug taken in 
overdose may not be known. The methods described herein can be used to 
characterise the biological consequences of the overdose and to rapidly identify 
candidate agents, fedirtatlng rapid inten^ntion to reverse the effects. Thus an overdose 

25 of opioids could rapidly be countered with naloxone. 

(h) Characterization and identification of poisons, and the metabolic or biological 
consequences of poisoning. Many victims of poisoning (e.g.. children) are unaware of 
the nature of the substance they have taken. Furthennore, the subject may be 
30 unconscious or unable to communicate. The methods described herein can be used to 
characterise the biological consequences of the poisoning and to rapidly identify 
candidate poisons. This would facilitate administration of appropriate antidote, which 
typically must be done as quickly as possible after exposure to (e.g., Ingestion of) the 
toxic substance. 

35 
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Medical Pro gnosis Applications 

(a) Prognosis (prediction of future outcome), including, for example, analysis of "old" 
samples to effect retrospective prognosis. For example, a sample can be used to 

5 assess the risk of myocardial infarction among sufferers of angina, pemiitting a more 
aggressive therapeutic strategy to be applied to those at greatest risk of progressing to a 
heart attack. 

(b) Risk assessment, to identify people at risk of suffering from a particular indication. 
10 The methods described herein can be used for population screening (as for diagnosis) 

but in this case to screen for the risk of developing a particular disease. Such an 
approach will be useful where an effective prophylaxis Is known but must be applied prior 
to the development of the disease in order to be effective. For example, 
bisphosphonates are effective at preventing bone loss in osteoporosis but they do not 
15 increase pathologically low bone mineral density. Ideally, therefore, these drugs are 
applied prior to any bone loss occumng. This can only be done with a technique which 
facflilates prediction of future disease (prognosis). The methods described herein can be 
used to identity those people at high risk of losing bone mineral density In the future, so 
that prophylaxis may begin prior to disease Inception. 



20 



25 



30 



35 



(c) Antenatal screening for a wide range of disease suscepObilitles. The methods 
described herein can be used to analyse blood or tissue drawn from a pre-temi fbtus 
(e.g., during chorionic vBus sampling or amniocentesis) for the purposes of antenatal 
screening. 

Aids to Theraoutic Intftn/antinn 

(a) Therapeutic monitoring, e.g., to monitor the progress of treatment. For example, by 
making serial diagnostic tests, it will be possible to detennine whether and to what extent 
the subject is returning to nomiai foHowing initiation of a therapeutic regimen. 

(b) Patient compliance. e.g.. monitoring patient compliance wfth therapy. Patient 
compliance Is often very poor, particulariy with therapies fliat have significant side- 
effects. Patients often claim to comply witti the ttterapeutic regimen, but tills may not 
always be the case. The methods described herein pennit ttie patient compliance to be 
monHored. botii by direcUy measuring «ie dmg concentration and also by examining 
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biological consequences of the dmg. Thus, the methods described herein offer 
significant advantages over existing methods of monitoring compliance (such as 
measuring plasma concentrations of the dmg) since the patient may tal<e the drug just 
prior to the investigation, while having failed to comply for previous weel<s or months. By 
5 monitoring the biological consequences of therapy, it is possible to assess long-term 
compliance. 

(c) Toxicology, including sophisticated monitoring of any adverse reactions suffered, e.g., 
on a patlent-by-patient basis. This will facilitate investigation of idiosyncratic toxicity. 

1 0 Some patients may suffer real, clinically significant side-effects from a therapy which 
were not seen In the majority. Application of the methods described herein facilitate 
rapid identification of these rare, idiosyncratic toxicities so that the therapy can be 
discontinued or modified as appropriate. Such an approach allows the therapy to be 
tailored to the indMdual metabolism of each patient. 

15 

(d) The methods described herein can be used for "phamriacometabonomics," in analogy 
to phamiacogenomics, e.g., subjects could be divided into "responders" and 
"nonresponders" using the metabonomic profile as evidence of "response," and features 
of the metabonomic profile could then be used to target future patients who would likely 

20 respond to a particular therapeutic course. For example, patients given statins could be 
monitored using Uie mettiods described herein for beneficial changes In tiie subtie 
composition of the lipoproteins which are associated wfth coronary heart disease. On 
this basis, tiie patients could be categorised into "statin responsive" or "statin 
unresponsive". In a second stage, tine methods described herein could be re-applied to 

25 the untreated metabonomic fingerprint to identify pattern elements which predict future 
responses to statins. Thus, the clinician would know whether or other patients should be 
treated with statins, witiiout having to wait weeks or months to assess the outcome. 

Tools for Drug Development 

30 

(a) Clinical evaluations of drug therapy and efficacy, fi^ for tiierapeutic monitoring, the 
methods described herein can be used as one end-point In clinical trials for efficacy of 
new tiierapies. The extent to which sequential diagnostic fingerprints move towards 
normal can be used as one measure of the efficacy of the candidate ttierapy. 



35 
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(b) Detection of toxic side-effects of drugs and model compounds (e.g., in the drug 
development process and In clinical trials). For example, it will be possible to identify the 
major sites of toxic effects (e.g., iiver. kidney, etc.) for new treatments during Phase I 
studies, as well as Identifying idiosyncratic toxicities during later stage clinical trials. 

(c) Improvement in the quality control of transgenic animal models of disease; aiding the 
design of transgenic models of disease. Transgenic models of various diseases have 
been useful for the prBdlnical development of new therapies. Although the transgenic 
model may recapitulate many of the phenotypic maricers of the human disease. It Is often 
unclear whether similar biochemical mechanisms underile the resuHing phenotype. 

(d) Other animal models of disease. For example, injection of bovine type II collagen 
into mice has often been used as model of rtieumatoid arthritis, resulting in joint swelling 
and autoantibodies, but the mechanisms resulting in the phenotype have little in common 
with the human disease. As a result, therapies which are effective in the animal model 
may be Ineffective in man. The methods described herein can be used to examine the 
metabolic and phenotypic consequences of gene manipulation or other interventions 
i«ed to yield an animal model of disease, and to compare those with the metabolic and 
phenotypic changes characteristic of the disease in man, and thereby \^lidate a range of 
animal models of human diseases. 

(e) Searching for new biochemical markers of disease and/or tissue or organ damage. 
For example, the NMR bin around 63.22 was identified as being particularty associated 
with coronary heart disease (see examples below), and the associated species has been 
Identified as a novel metabolic mart<er of coronary heart disease which may be 
amenable to therapeutic intervention. 



Commercial and Other Non-Medfcal ApDlications 



(a) Commercial classlftcatlon for actuarial assessment, to address the commercial need 
for Insurance companies to assess future risk of disease. Examples Include the 
provision of health instance and general life cover. Thb application is simDar to 
prognostic assessment and risk assessment in population screening, except that the 
purpose is to provfcje accurate actuarial Infomnatlon. 
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(b) Clinical trial enrollment, to address the commercial need for the ability to select 
individuals suffering from, or at risk of suffering from, a particular condition for enrolment 
in clinical trials. For example, at present to perfomi a clinical trial to assess efficacy of a 
drug intended to prevent heart disease it would be necessary to enroll at least 4.000 

5 subjects and follow them for 4 years. If it were possible to select individuals who were 
suffering from heart disease, it is estimated that it would be possible to use 400 subjects 
followed for 2 years reducing the cost by 25-fold or more. 

(c) Characterization and identification of illicit drugs, and the metabolic or biological 

1 0 consequences of substance abuse. As for monitoring patient compliance with desired 
therapeutics, the methods described herein can be used to examine the metabolic 
consequences of illegal substance abuse, pennitting confinnation of the use of the 
substance, even if none of the substance or its metabolites are present in the system at 
the time of investigation. This circumvents the ability to use proscribed substances 

15 chronically, but to temporally suspend their use to avoid being identified. This 

application could be applied to identification of habitual users of illegal drugs (such as 
heroin, cocaine, amphetamines, etc.) for police use, or for monitoring use of banned 
substances in sports (e.g., to detect use of anabolic steroids among athletes, etc.). 

20 (dX Application to pathology and post-mortem studies. For example, the methods 
described herein could be used to identify the proximate cause of death in a subject 
undergoing post-mortem examination. 

(e) Application to forensic science. For example, the methods described herein can be 
25 used to identify the metabolic consequences of a range of actions on a subject (who may 
be either dead or alive at the time of the investigation). For example, the methods 
described herein can be applied to identify metabolic consequences of asphy)dation. 
poisoning, sexual arousal, or fear. 

30 (f) Analysis of samples other than mammalian cells or biofluids. For example, the 

methods described herein can be applied to a panel of wines, classified by experts for 
their quality. By recognising patterns associated with good quality, the methods 
descnlDed herein can be used by wine manufacturers during the preparation of blends, 
as well as by wine purchasers to facilitate a rapid and independent assessment of the 

35 quality of a given wine. 
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(g) The methods described herein can also be used to identily (known or novel) 
genotypes and/or phenotypes, and to detemiine an organism's phenotype or genotype. 
This may assist with the choice of a suitable treatment or facilitate assessment of its 
relevance in a doig development process. For example, the generation of metabonomic 
data in panels of individuals with disease states, infected states, or undergoing treatment 
nnay indicate response profiles of groups of individuals which can be differentiated into 
two or more subgroups, indicating that an allelic genetic basis for response to the 
disease, state, or treatment exists. For example, a particular phenotype may not be 
susceptible to treatment with a certain drug, while another phenotype may be susceptible 
to treatment Conversely, one phenotype might show toxicity because of a failure to 
metabolise and hence excrete a drug, which drug might be safe in another phenotype as 
it does not exhibit this effect. For example, metabonomic methods can be used to 
determine the acetylator status of an organism: there are two phenotypes. corresponding 
to "fast" and "slow" acetylation of drug metabolites. Phenotyping can be achieved on the 
basis of the urine alone (i.e., without dosing a xenobiotic), or on the basis of urine 
following dosing with a xenobiotic which has the potential for acetylation (e.g., 
galactosamlne). Similar methods can also be used to detemiine other differences, such 
as other enzymatic polymorphisms, for example, cytochrome P450 polymorphism. 

As shown below, the methods described herein can be used successfully to discriminate 
between twins, whether identical twins or non-identical twins. 

The methods described herein may also be used in studies of the biochemical 
consequences of genetic modification, for example, in "knock-out animals" wher« one or 
more genes have been removed or made non-functional; in "knock-in" animals where 
one or more genes have been incorporated finom the same or a different spedes; and in 
animals where the number of copies of a gene has been Increased, as in the model 
which results in the over-expression of the beta amyloid protein in mice brains as a 
model for Alzheimer's disease). Genes can be transferred between bacterial, plant and 
animal species. 

The combination of genomic, proteomic, and metabonomfc data sets into comprehensive 
"bionomitf' systems may pemiit an holistic evaluation of perturbed In vh^o function. 

The methods described herein may be used as an alternative or adjunct to other 
methods, e.g., the various genomic, pharmacogenomic, and proteomic methods. 



wo 02/086501 



-109- 



PCT/GB02/01862 



EXAMPLES 

The following are examples are provided solely to illustrate the present invention and are 
5 not intended to limit the scope of the present invention, as described herein. 

Example 1 
Osteoporosis 

10 As discussed above, the inventors havia developed novel methods (which employ 
multivariate statistical analysis and pattern recognition (PR) techniques, and optionally 
data filtering techniques) of analysing data (e.g., NMR spectra) from a test population 
which yield accurate mathematical models which may .subsequently be used to classify a 
test sample or subject, and/or in diagnosis. 

15 

These techniques have been applied to the analysis of blood serum in the context of 
osteoporosis. The metabonomic analysis can distinguish between individuals with and 
without osteoporosis. Novel diagnostic biomarkers for osteoporosis have been 
identified, and methods for associated diagnosis have been.developed. 

20 

Briefly, metabonomic methods were applied to blood serum sample for subjects in an 
osteoporosis study. Biomaricers, including free proline, were identified as being 
diagnostic for osteoporosis. Subsequently, proline levels were used to classify 
(e.g., diagnose) patients, specifically, by using predicGve mathematical models which 
25 take account of free proline levels. 

Collection of NMR Spectra 

Analysis was performed on serum samples collected from subjects under study. Serum 
30 taken from control subjects (n=40) and patients with osteoporosis (n=29), prior to a 
formal diagnosis of bone disease. 

The data were classified as "control" (triangle, A) or "osteoporosis" (circle, 

35 Osteoporosis was diagnosed according to bone mineral density (BMD) of the lumbar 
spine (LS), which was expressed as a Z-score. Osteoporosis in a subject was 
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diagnosed using the World Health Organisation (WHO) definition of osteoporosis as a 
bone mineral density (BMD) which was below a cut-off value which was 1.5 standard 
deviations (SDs) below the age- and sex-matched mean (i.e., a Z-score of -1.5 or below) 
or by the presence of spinal fractures (see, e.g. , World Health Organisation, 1 994). 
Control subjects had a Z-score above this cut-off value and no history of fractures. 

Blood was drawn from each patient, allowed to clot In plastic tubes for 2 houre at room 
temperature, and the serum was collected by centrifugation. Aliquots of serum were 
stored at-80°C until assayed. 

Prior to NMR analysis, samples (150 were diluted with solvent solution (10% D2O v/v, 
0.9% NaCI w/v) (350 pi). The diluted samples were then placed in 5 mm high quality 
NMR tubes (Goss Scientific Instruments Ltd). 

Conventional 1-D ^H NMR spectra of the blood serum samples were measured on a 
Bruker ORX-600 spectrometer using the conditions set forth in the section entitled "NMR 
Experimental Parameters." 

NMR Experimental Parameters 

(a) General: 

Samples were NON-SPINNING in the spectrometer 

Temperature: 300 K 

Operating Frequency: 600.22 MHz 

Spectral Width: 8389.3 Hz 

Number of data points (TD): 32K 

Number of scans: 64 

Number of dummy scans: 4 (once only, before the start of the acquisition). 
Acquisition time: 1 .95 s 

(b) Pulse Sequence: 

noesyprld (Baiker standard noesypresat sequence, as listed in their manual): RD - 90* 

-ti-90«-ti„-.90«»-FID 

Relaxation delay (RD): 1.5 s 

Fixed inten^al (ti): 4 ps 

Mixing time {to): 150 ms 
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90** pulse length: 10.9 ^is ' 
Total recycle period: 3.6 s 

Secondary irradiation at tlie water resonance during RD and t„, 

5 (c) Phase Cycling 

The phase of the RF pulses and the receiver was cycled on successive scans to remove 
artefacts acconding to the following scheme, where PH1 refers to the first 90** pulse, PH2 
refers to the second, PH3 refers to the third and PH31 refers to the phase of the 
receiver. In the following scheme: 

10 

0 denotes 0** phase increment 

1 denotes 90** phase increment 

2 denotes 180^ phase increment 

3 denotes 270"" phase increment 

15 

PH1=02 

PH2=00 0000 0022222 22 2 

PH3 = 00221133 

PH31 =0220 1 331200231 1 3 

20 

(d) Processing of the FIDs: 

This was done using using XWINNMR (version 2.1, Bruker GmbH. Germany). 
Automatic zero fill x 2 at end of FID. 

Line broadening by multiplying the FID by a negative exponential equivalent to a line 
25 broadening of -K).3 Hz. 
Fourier transform. 

(e) Processing of the NMR spectra: 

This was done using using XWINNMR (version 2.1, Bruker GmbH, Germany). 
30 Spectrum peak phase adjusted manually using the zero and first order parameters 
PHCO. PHC1. 

Baseline corrected manually using the command "basl." This allows the subtraction of 
baselines of various degrees of polynomial. The simplest is to subtract a constant to 
remove a DC offset and this was sufficient in the present case. In other cases, it can be 
35 necessary to subtract a straight line of adjustable slope or to subtract a baseline defined 
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The possibility exists within the software for functions up to 



Once properly phased and baseline conected. the full spectra showed a flat featureless 
5 baseline on both sides of the main set of signals (i.e., outside the range 5 0 to 10). and 
the peaks of interest showed a clear in-phase absorption profile. 

NMR chemical shifts in the spectra were defined relative to that of the lactate methyl 
group (the middle of the doublet, taken to be at 6 1.33). 

10 

(f) Reduction of the NMR spectra to descriptors 

The NMR spectra in the region 6 10 - 6 0.2 were segmented into 245 regions or 
"buckets" of equal length (6 0.04) using AMIX (Analysis of Mixtures software, version 
2.5, Bruker. Germany). The integral of the spectrum in each segment was calculated. In 
1 5 order to remove the effects of variation in the suppression of the water resonance, and 
also the effects of variation in the urea signal caused by partial cross solvent saturation 
via solvent exchanging protons, the region 6 6.0 to 4.5 was set to zero Integral. The 
following AMIX profile was used: 

20 command=bucket_1d_table 

input-file=<namesfile> 

output Jile=<mydata.amix> 

left_ppm=10 

right j}pm=0.2 
25 exdudeljefl _ppm=6.0 

exclude1_rightj>pm=4.5 

exclude2 Jeflj3pm= (intentionally undefined) 

exclude2_rightj3pm= (intentionally undefined) 

bucketjMdth=0.04 
30 bucket_mode3=0 

bucket_scale_mode=3 

bucket_multiplier=0.01 

bucket_output_fonnat=2 

normalization_regionJeft=1 0 
35 normalization_region_right=0.2 
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The integral data were normalized to the total spectral area using Excel (Microsoft, 
USA). Intensity was integrated over all included regions, and each region was then 
divided by the total integral and multiplied by a constant (i.e., 100, so that final Integrated 
intensities are expressed as percentages of the total intensity). 

5 

The normalized data were then exported to the SIMCA-P (version 8.0 Umetrics, 
Sweden) software package and each descriptor was mean-centered. All subsequent 
analysis was therefore performed on normalised mean-centered data. 

10 Data Analysis 

A Principal Components Analysis (PCA) model was calculated from the ID NMR 
spectra of serum samples from control subjects (A) and patients with osteoporosis (•). 
The con-esponding scores and loadings plots are shown in Figure 1A-OP and Figure 
15 IB-OP, respectively. Those regions of the NMR spectmm which are responsible for 
causing separation between the different samples are also indicated in Figure 1B-OP. 
Separation between controls and osteoporosis is evident in PC2, with control samples 
dominating the lower two quadrants and osteoporosis samples dominating the upper two 
quadrants. 

20 

A Prindpal Components Analysis (PCA) model was calculated from the ID NMR 
spectra of serum samples from control subjects (A) and patients with osteoporosis (•), 
but. In this case, prior to PCA, the data were filtered by application of orthogonal signal 
correction (OSC), which serves to remove variation that is not correlated to class and 
25 therefore improves subsequent data analysis. The comesponding scores and loadings 
plots are shown In Rgure 1C-OP and Rgure ID-OP, respectively. 

The improved separation between the control and osteoporosis samples is evident, with 
controls dominating the left hand side of the plot and osteoporosis dominating the right 
30 hand side. Note also, that application of OSC results in maximum variation being 
observed In PCI rather than in PC2. 

Improved separation is possible using PLS-DA (rather than the unsupervised PCA). A 
scores plot and the corresponding loadings plot is shown in Figure 1E-OP and Figure 
35 4-1F-0P, respectively. Improved separation is evident, with controls dominating the right 
hand side of the plot and osteoporosis dominating the left hand side. 
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Rgure 2A-OP shows sections of the variable importance plots (VIP) and regression 
coeffident plots derived from the PLS-DA model described In Figure lE-OP. 

5 Figure 2B-0P shows a section of the regression coefficient plot derived from the PLS-DA 
model described in Figure IE-OP. In the regression coefficient plot, each bar represents 
a spectral region covering 0.04 ppm and shows how the NMR profile of one control 
samples differs from the NMR profile of a osteoporosis samples. A positive value on 
the X-axis indicates there is a relatively greater concentration of metabolite (assigned 
10 using NMR chemical shiifassignment tables) and a negative value on the x-axIs 
indicates a relatively lower concentration of metabolite. 

The 10 most important chemical shift windows for the PLS-DA model are summarised in 
the followring table. The assignments were made by comparing the loadings with 
15 published tables of NMR data 







Table 1-OP 


# 


Bucket 
Region 
(ppm) 


Assignment 


Chem. Shift (ppm) 
and Multiplicity 


NMR spe.ctrai 
intensity, In 
osteoporosis wrt 
control 


1 


1.34 


predominantly lipid 
CijaCHzCHsCO 


1.32(m) 


decreased* 






also lactate CH3 


1.33(d) 


Increased* 


2 


1.30 


lipid 
CH2 


1.30(m) 


decreased 


3 


1.26 


lipid 

(CBOn. mainly LDL 


1.25(m) 


decreased 


4 


0.86 


lipid 

Cj^, mainly LDL, 
VLDL 


0.84(t)&0.87(t) 


decreased 


5 


3.38 

-. . . 


proline 
haif5-CH2 


3.34(m) 


decreased 
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Table 1-OP 


# 


Bucket 
Region 
(ppm) 


Assignment 


Chem. Sliift (ppm) 
and Multiplicity 


NMR spectral 
intensity, in 
osteoporosis wrt 
control 


6 


2.06 


proline 
halfp-CHz 


2.05(m) 


decreased 


7 


2.02 


proline 
Y-CHa 


1.99(m) 


decreased 


8 


4.10 


lactate 
CH 


4.11(q) 


Increased 


9 


3.34 


proline 
half5-CH2 


3.34(m) 


decreased 


10 


3.22 


choline 
N(CH3)3 


3.21(s) 


decreased 



* Intensity changes of these overlapped peaks were detemnined by referral to the original 
'H NMR spectra. 



5 In summary, with respect to control samples, osteoporosis samples appear to have 
decreased levels of lipids, proline, choline, and 3-hydroxybutyrate, and Increased levels 
of lactate, alanine, creatine, creatinine, glucose, and aromatic amino acids. Additional 
data for the buckets associated with these species are described in the following table. 
Again, the assignments were made by comparing the loadings wrth published tables of 
10 NMR data. 



Table 2-OP 


Bucket 


Assignment 


Chem. Shift (ppm) and 


NMR spectral 


Region 




MuKiplidty 


intensity, in 


(ppm) 






osteoporosis wrt 
control* 


lipid 


1.34 


CH2CH2CH2CO 


1.32(m) 


decreased 


1.30 




1.30(m) 


decreased 


1.26 


(CiWn. LDL 


1.25(m) 


decreased 
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Table 2-OP 


Bucket 


Assignment 


Chem. Siiift (ppm) and 


NMR spectral 


Region 




Multiplicity 


intensity, in 


(ppm) 






osteoporosis wrt 
control* 


1.22 


CHaCHgCH? 


1.22(m) 




0.86 


CHa. LDL. VLDL 


0.84(t)&0.87(t) 


decreased 




proline 


.3.38 


half5-CH2 


3.34(m) 


decreased 


3.46 


ha!f5-CH2 


3.45(m) 


decreased 


3.42 


half5-CH2 


3.45(m) 


decreased 


2.34 


halfp-CHz 


2.36(m) 


decreased 


2.06 


half^-CHj 


2.05(m) 


decreased 


2.02 


Y-CHa 


1.g9(m) 


decreased 




choline 


3.22 


N(CH3)3 


3.21 (s) 


decreased 


3.66 


NCHa 


3.66(m) 


decreased 


3-hydroxybiJtyrate 


4.14 


P-CH 


4.1 3(m) 


decreased 


2.38 


halfa-CHa 


2.38(m) 


decreased 


2.30 


halfa-CHz 


2.31 (m) 


decreased 


1.14 


Y-CHa 


1.20(d) 


decreased 


lactate 


4.14 & 


CH 


4.1 1(q) 


increased 


4.10 








1.34 


CH, 


1.33(d) 


increased 


alanine 


3.74 


a-CH 


3.76(q) 


increased 


1.46 


CH, 


1.46(d) 


increased 


creatine 


3.90 


CHa 


3.93(s) 


increased 


3.02 


CH, 


3.04(8) 


increased 


creatinine 


4.06 


CHz 


4.05(s) 


increased 


3.06 


CH3 


3.05(s) 


increased 
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Table 2-OP 


Bucket 
Region 
(ppm) 


Assignment 


Chem. Shift (ppm) and 
Multiplicity 


NMR spectral 
intensity, in 
osteoporosis wrt 
control* 


glucose 


3.66-4.42 


various 


3.2-5.5 


increased 


aromatic amino acids * 


7.00-B.OO 


various 


7.00-aoo 


increased 



The Intensity changes for the proline resonance at 53.42 and S3.46, the choline 
resonance at 53.66, the lactate resonance at 51.34 and the p-hydroxybutyrate 
resonance at 54.14. all of which overlap with other peaks, were confinmed by refenal to 
5 the original NMR spectra. 

ValidaBon 

Validation was performed using a y-predicted scatter plot. Figure 3-OP shows the y- 
10 predicted scatter plot, and hence the ability of NMR based metabonomlcs to predict 
class membership (control or osteoporosis) of unknown samples. Using -'85% of the 
control and osteoporosis samples, a PLS-DA model was constructed and used to predict 
the presence of disease in the remaining 15% of samples (the validation set). The y- 
predicted scatter plot assigns samples to either class 1 0" this case corresponding to 
1 5 control) or dass 0 (In this case corresponding to osteoporosis); 0.5 is the cut-off. The 
PLS-DA model predicted the presence or absence of osteoporosis in 100% of cases, 
furthermore, for a four-component model, dass can be predicted with a sfgnificanoe level 
^ 88%, using a 99% confidence iimIL 

20 Proline as DiaanosHc Specles/Blomaricer 

Following this analysis, the buckets designated 3.38. 2.06. 2.02. 3.34 were identified as 
having lower intensity in osteoporosis patient plasma as compared to control samples. 

25 Re-examination of the original NMR spectra rather than the data-reduced, segmented 
files derived from them which are used for the statistical analysis, enables a visual 
Inspection of the NMR peaks in those specific regions. Identification of the peak 
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multipllclties in these regions leads a trained NMR spectroscopist to suggest free proline 
as the molecule responsible for the peaks. The fact that these peaks are spin-coupled to 
each other and hence are part of the same molecule comes ftxjm intefpretation of the 
cross-peaks seen in a 2-dimensional COSY spectrum. The NMR peaks seen in the 
5 conventional 1-dimensional NMR spectrum are then compared visually with those of 
authentic proline dissolved in water at a comparable pH value. See, for example, 
Ellenberger et al., 1975; Lindon et al., 1999. 

The regions 3.38 and 3.34 are both seen to include part of a multlplet at 53.34 
10 assignable to one of the protons of the 6-CH2 pair of hydrogen atoms. Thereglon 

designated 2.06 shows a resonance at 62.05 identifiable as one of the protons fi-om the 
p-CHz group. Similarly the region designated 2.02 contains a resonance at 51.99 
identified as one or both of the y-CHa protons of proline (the chemical shift difference 
between the two y protons is small). The peak multiplicity of each of these peaks is 
15 consistent with an authentic sample of proline measured under comparable conditions. 

There are 4 other proton resonances for proline which should also show a change in 
level with osteoporosis if proline is a biomarker. These are the other p-, y-, and 6-CH2 
protons at 62.34. -62.0, and 63.45 respectively and the a-CH proton at 64.14. Indeed, 
20 examination of the spectra shows that the intensity of the signals for the other p-CHj and 
6-CH2 protons also conelate with the diagnosis. It is not possible to distinguish the other 
y-CHz proton because its shift is close to the first y-CHa proton and may already have 
been included above. Nor is it possible to observe the chemical shift of the a-CH proton 
because of spectral overlap. 



25 



30 



Finally, confirmatton that proline is the substance responsible for the diagnostic NMR 
peaks is obtained by adding a sample of authentic proline to a plasma sample and noting 
complete coincidence of all of the endogenous signals assigned to proline with those of 
the added proline. 



The NMR chemical shifts for all amino acids including proline are dependent on the 
solution pH because of the presence of the ionlsable groups. In the case of proOne. 
these are the carboxylic add group (-COOH) and the secondary amine group (-NH-). 
Hence it is important to compare the NMR spectra of plasma with that of an authentic 
35 sample of proline at the same pH. This has been done as descn*bed above. 
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In addition, it is possible for amino acids to react with bicarbonate ion (HCOi) in a 
biological sample to fomn carbamate adduds, i.e., formed between the amino acid amino 
group and the bicarbonate ion. The resulting adduct has different NMR chemical shifts 
to those of the parent amino acid. This problem has not been seen with proline 
5 specifically. However, this problem of changed chemical shifts can be overcome by 
adding authentic proline to the appropriate plasma sample and noting exact coincidence 
of all of the added proline proton peaks with those of the endogenous biomarker peaks. 



The foregoing has described the principles, prefenBd embodiments, and modes of 
operation of the present invention. However, the invention should not be construed as 
limtted to the particular embodiments discussed. Instead, the above-described 
embodiments should be regarded as illustrative rather than restrictive, and it should be 
15 appreciated that variations may be made in those embodiments by workers skilled in the 
art without departing from the scope of the present invention as defined by the appended 
claims. 



wo 02/086501 



-120- 



PCT/GB02/01862 



REFERENCES 

A number of patents and publications are cited herein in order to more fully describe and 
disclose the Invention and the state of the art to which the invention pertains. Full 
citations for these references are provided herein. Each of these references is 
incorporated herein by reference in its entirety Into the present disclosure, to the same 
extent as if each individual reference was specifically and Individually indicated to be 
incorporated by reference. 

Ala-Korpela. M., 1995. "H-1 NMR spectroscopy of human blood plasma." PQggressjn 
Nuclear Magnetic Resonance Spectroscopy. Vol. 27, pp. 475-554. 

Ala-Korpela, M., HiKunen, Y. and Bell, J.D., 1995. "Quantification of biomedical NMR 
data using artificial neural network analysis: Lipoprotein lipid profiles from H-1 
NMR data of human plasma." NMR Biomed.. Vol. 8, pp. 235-244. 

Andersen, C. A., 1999, "Direct orthogonalization." Chemometrics and Intellioent 
Laboratory Svstems. Vol. 47, pp. 51-63. 

Anker, LS.; and Jurs, P.C„ 1992, "Prediction of C-13 nuclear magnetic resonance 
chemical shifts by artifidal neural networks," Anal. Chem. . Vol. 64, pp. 1 157- 
1164. 

Anthony, M.L et a)., 1994. "Pattern recognition classification of the site of nephrotoxicity 

based on metabolic data derived from prpton nuclear magnetic resonance 

spectra of urine," Mol. Phamriacol. . Vol. 46. pp. 199-21 1. 
Anthony, M.L et al., 1995. "Classification of toxin-induced changes in NMR spectra of 

urine using an artificial neural network." J. Phann. Biomed. Anal.. Vol. 13, 

pp. 205-211. 

Beckwith-Hall, B.M. et al., 1998, "Nuclear magnetic spectroscopic and principal 

components analysis Investigations into biochemical effects of three model 

hepatotoxlns." Chem. Res. Tox.. Vol. 11. pp. 260-272. 
Bemrian J.W., Guida M.P., Wanen J., Amat J., and Brosnan C.F.. 1996. "Localization of 

monocyte chemoattractant peptlde-1 expression In the central nen/ous system in 

experimental autoimmune encephalomyelitis and trauma In the rat", Journal of 

Immunology. Vol. 156. pp 3017-3023. 
Bemian, J.L. Wynne, J.. Cohn. P.P. (1978). "A multivariate approach for interpreting 

treadmill exerdse tests in coronary artery disease." Circulation. Vol. 58, pp. 505- 

512. 



wo 02/086501 



PCT/GB02/01862 



-121- 

Bishop. C, 1995, Neural Networks for Pattern Recognition, University Press, Oxford, 

England, pp. 164-193. 
Breslow, J.L., 1993, 'Transgenic mouse models of lipoprotein metabolism and 

atherosclerosis." Proc. Natl. Acad. Sci. USA . Vol. 90, pp. 8314-8318. 
5 Bretthorst. G.L.. 1990a. "Bayesian Analysis. 2. Signal-Detection and Model Selection." J. 

Magn. Reson., Vol. 88, pp. 552-570. 
Bretthorst. G.L. 1990b, "Bayesian Analysis. 3. Applicants to NMR Signal-Detection, 

Model Selection, and Parameter-Estimation," J. Magn, Reson., Vol. 88. 

pp. 571-595. 

10 Bretthorst, G.L, Hung. C.C., Davlgnon, D.A.. et a!., 1988. "Bayesian-Analysis of Time- 
Domain Magnetic Resonance Signals." J. Magn. Reson., Vol. 79. pp. 369-376. 
Bro, R., 1997, "PARAFAC. Tutorial and applications," In Chemometrics and IntelHoent 

Laboratory Systems. Vol. 38, pp. 149-171. 
Broomhead. D.8., and Lowe, D., 1988. "Multi-variable functional interpolation and 
15 adaptive networks," Complex Systems. Vol. 2. pp. 321-355. 

Brown. T.R. and Stoyanova, R., 1996, "NMR spectral quantitation by principal- 
component analysis .2. Detemiination of frequency and phase shifts," J. Magn. 
Reson.. Series B. Vol. 112, pp. 32-43. 
Bruce, R.A., 1974. The value of the Baike protocol." Am. Heart J. . Vol. 88, pp. 533-534. 
20 Claridge. TD.W., High-Resolution NMR Techniaues in Onaanic Chemistry: A Practical 
Guide to Modem NMR for Chemists. Oxford University Press, 2000. 
Collins. F.S. and McKusick, VA. 2001. "Implications of the Human Genome Project for 

medical science," JAMA. Vol. 285, pp. 540-544. 
Confort-Gouny, 8., VIon-Dury, J., NIcoli, P., Dano, P., Gastaut, J.-L, and Cozzone, P.J.. 
25 1992. "Metabolic characterization of neurological diseases by proton localized 

nmr-spectroscopy of the human brain," Gemotes Rendus de I'Academie des 
Sciences Serie III - Sciences de la Vie-Life Sciences. Vol. 315. pp. 287-293. 
Cullen. P., Funke, H., Schulte, H. and Assmann, G., 1998, "Lipoproteins and 

cardiovascular risk - from genetics to CHD prevention." European Heart Journal. 
30 Vol. 1 9, pp. C5-C1 1 . Suppl. C. 

Despres, J., Lemieux, I., Dagenais, G., Cantin. B. and Lamarche, B.. 2000, 

"HDL-cholesterol as a mari<er of coronary heart disease risk; the Quebec 
cardiovascular study." Atherosclerosis . Vol. 153, pp. 263-272. 
Dolecek. T.A., Mtlas. N.C„ Van Honfi, LV.. Fanrand. M.E,, Gorder. D.D., Duchene, A.G., 
35 Dyer, J.R., Stone, PA and Randall, B.L. 1986, "A long-temi nutrition 



wo 02/086501 



PCT/GB02/01862 



-122- 

intervention experience - lipid responses and dietary adherence patterns In the 
multiple risk factor intervention trial." J. Am. Diet Assoc.. Vol. 86, pp. 752-758. 
Dull, M.J. and Lee, K.H.. 2000. "Proteomic analysis." Curr. Qoin. Blotechnoi Vol. 1 1 , 
pp. 176-179. 

5 Dvorak AM., Schroeder J.T.. MacGlashan D.W.. Bryan K.P.. Morgan E.S., Uchtenstein 
LM. and MacDonald S.M., 1996, "Comparative ultrastructural morphology of 
human basophils stimulated to release histamine by anti-lge, recombinant IGE- 
dependent histamine-releasing factor, or monocyte chemotacBc protein-1", 
Journal of Al lergy and Clinical Immunolooy . Vol. 98, pp 355-370. 
10 Eriksson, L. Johansson. E., Kettaneh-Wold, H., and Wold. S.. 1999. lntix>ducKon to Mum 
and Meqavariate Analysis using Projection Methods fPCA & PLSY UMETRICS 
Inc. (Box 7960. SE90719 Umea, SWEDEN), pp. 267-296. 
Fan, T.W.-M.. 1996, "Metabolite profiling by one- and two-dimensional NMR analysis of 
complex mixtures," Proa. NMR Spectrosc Vol. 28. pp, 161-219. 
15 Farrant. R.D., et al., 1992, "An automatic data reduction and bansfer method to aid 
pattem-recogniHon analysis and classification of NMR spectra," J. Phami. 
Biomed. Anal,. Vol. 10. pp, 141-144. 
Feam, T., 2000, "On orthogonal signal con-ection." Chemometrics and Intelligent 

Laboratorv Systems . Vol. SO, pp. 47-52. 
Frank. I.E.. et al., 1984. "Prediction of product quality from specbBi data using the partial 

least-squares mettiod," J. Chem. Info. Comp. . Vol. 24. p. 20-24. 
Garrod. S., Humpher, E.. Connor, S.C.. Connelly. J.C.. Spraul, M.. Nicholson, J.K.. and 
Holmes. E., 2001. "High-resolution H-i NMR and magic angle spinning NMR 
spectroscopic investigation of ttie biochemical effects of 2-bromoettianamine in 
Intact renal and hepatic tissue," Magn. Reson. Med.. Vol. 45. pp, 781-790. 
Garfland, K.P.R, et al., 1990a, "A pattern recognition approach to tiie comparison of ^H 
NMR and dlnfcal chemical data fordassificatfon of nephiX)toxlcity." J. Phami. 
Biomed. Anal.. Vol. 8. pp. 963-968. 
Gartiand, K.P.R. et a!., 1990b. "Pattern recognition analysis of high resolution ^H NMR 
spectra of urine. A nonlinear mapping approach to the dasslRcation of 
toxicological data." NMR In Biomed.. Vol. 3, pp. 166-172. 
Gartiand. K.P.R. et al., 1991, 'The application of pattern recognition methods to the 
analysis and dassification of toxicological data derived from proton NMR 
spectroscopy of urine." Mol. Pharniaco!.. Vol. 39, pp. 629-642. 
Geisow. M.J., 1998. "Proteomics: One small step for a digital computer, one giant leap 
for humankind," Nature Biotechnoiogy , Vol. 16. p. 206. 



wo 02/086501 PCT/GB02/01862 

-123- 

Ghimikar R.S., Lee Y.L. He T.R.. Eng L.F., 1996, "Chemokine expression in rat stab 

wound brain injury". Journal of Neuroscience Research, Vol. 46, pp 727-733. 
Gong J.+l.,Ratkay LG., Waterfield J.D., and Clark-lewis I., 1997, "An antagonist of 

monocyte chemoattractant protein 1 {mcp-1) inhibits arthritis in the mrl-/pr mouse 
5 model", Journal of Experimental l\/ledicine . Vol. 1 86. pp 1 31 -1 37. 

Guyton, A.C., 1991, "Chapter 12: Electrocardiographic interpretation of cardiac muscle 

and coronary abnomialities," In: A Textbook of Medical Physiology. Eighth Edition 

(WB Saunders, London), pp. 124-137. 
Gygi, S.P.; Rochon, Y.; Franza, B.R,; Aebersold, R, 1999, "Correlation between protein 
10 and mRNA abundance in yeast," IVIolecular and Cellular Biology . Vol. 19, pp, 

1720-1730. 

Hare, B.J.. and Prestegard, J.H., 1994, "Application of neural networks to automated 

assignment of NMR spectra of proteins," J. Biomol. NMR. Vol. 4, pp. 35-46. 
Hiltunen. Y., Heiniemt, E. and Ala-Korpela, M., 1995, "Lipoprotein lipid quantification by 
15 neural-network analysis of H-1 NMR data from human blood-plasma," J. Mag. 

Res. Ser. B . Vol. 106. pp. 191-194. 
Holmes. E. et al., 1998a, "Deyelopment of a model for classification of toxin-induced 

lesions using NMR spectroscopy of urine combined with pattern recognition." 

NMR in Biomed. . Vol. 1 1, pp. 235-244. 
20 Holmes. E. et a!., 1998b, 'The identification of novel biomarkers of renal toxicity using 

automatic data reduction techniques and PCA of proton NMR spectra of urine," 

Chemomet. & Intel. Lab Systems . Vol. 44, pp. 245-255. 
Holmes. E., et al., 1992, "NMR spectroscopy and pattern recognition analysis of the 

biochemical processes associated with the progression and recovery from 
25 nephrotoxic lesions in the rat induced by mercury(il}chloride and 2-bromo- 

ethanamine." Mol. Phamiacol.. Vol. 42, pp. 922-930. 
Holmes, E., et al.. 1994, "Automatic data reduction and pattern recognition methods for 

analysis of NMR spectra of human urine from nomial and pathological states," 

Anal. Biochem. . Vol. 220, pp. 284-296. 
30 Howells. S.L. Maxwell. RJ.. Howe, FA, Peet, A.C.. Stubbs. M., Rodrigues, LM., 

Robinson. S.P.. Baluch. S., and Griffiths, J.R., 1993, "Pattern-recognition of P-31 

magnetic-resonance spectroscopy tumor spectra obtained in-vivo," NMR 

Biomed. . Vol. 6, pp. 237-241. 
lida K, ICadota J.. Kawakami K., Matsubara Y., Shirai R., and Kohno S., 1997, "Aanalysis 
35 of T cell subsets and beta chemokines in patients with pulmonary sarcoidosis", 

Thorax . Vol. 52. pp 431-437. 



wo 02/086501 



PCT/GB02/01862 



-124- 

Isles, C.G. and Paterson. J.R., 2000, "Identifying patients at risk for coronary lieart 
disease: Implications from trials of lipld-lowering dmg therapy," Q. J. Med.. 
Monthly Jou rnal of the Association of Physicians. Vol. 93, pp. 567-574. 

Joreskog. K.G.. and Wold, H., 1982 Systems under Indirect Obsen/ation North Holland, 
5 Amsterdam. 

Kannel, W.B, Gordon, T. (eds.), February 1974. The Framinaham Study. An 

epMemloloaical investiq atfon of cardiovascular disease. OHEW pub. no. (NIH) 
74-599, Public Health Service, Washington. DC (U.S. Government Printing 
Office). 

10 iqelsberg, M.O., Cutler, JA and Doleoek, TA. 1997, "Brief description of the Multiple 
Risk Factor Intervention Trial," Amer. J. CKnical Nutrition . Vol. 65 (supplement), 
PP.S191-S195. 

Klenk, H.P., et al., 1997, "The complete genome sequence of the hyperthemiophilic, 
sulphate-redudng archaeon Archaeoglobus fulgidus," Nature. Vol. 390, pp. 364- 
15 370. 

Kopka, P. Dormann. T. AKmann, R.N. Trethewey and L. Willmitzer. 2000, "Metabolic 
profiling for plant functional genomics," Nature BiotechnQloav. Vol. 18, pp. 1 157- 
1161. 

Kowalskl. B.R.. Sharaf. M. and lllman D., Chemometrics (John Wiley & Sons, " 
Chichester. 1986). 

Kuesel, A.C., Stoyanova, R., Aiken. N.R., U. C.-W., Szwergold, B.S., Shaller. C. and 
Brown, T.R., 1996, "Quantitation of resonances in biological P-31 NMR spectra 
via principal component analysis: Potential and limitations," NMRBtomed.. Vol. 9, 
pp. 93-104. 

Kuller, L.H., Ockene. J.K., Meilahn, E., Wentworth, D.N., Svendsen, K.H. and Neaton. 
J.D., 1991, "Cigarette-smoking and mortafity." Preventative Medicina. Vol 20, 
pp. 638-654. 

Kvalheim. O. M.. Karstang. T. V., 1989, "Interpretation of latent-variable regression 

'"°<le's>" Chemometri cs and Intallioent Laboratory Systems Vol. 7, pp. 39-51. 
Undon. J.C.. et al.. 1980, "Digitisation and Data Processing in Fourier Transform NMR," 

ProarBss In NMR SpectroBco pY- Vol. 14. pp. 27-66. 
Undon. J.C., et al.. 1999, "NMR spectroscopy of bloflulds," in Annual Reports on NMR 
Spectroscopy (Webb, Q.A., ed.), Academic Press (London). Vol. 38, pp. 1-88. 
Undon. J.C.; Holmes. E; Nicholson, J.K.. 2001. "Pattern recognition methods and 
applications in biomedical magnetic resonance." Progress In NMR 
Spectroscopy," Vol. 39, pp. 1-40. 



wo 02/086501 



PCT/GB02/01862 



-125- 

Martin, G.J., 1998, "Recent advances in site-specific natural isotope fractionation studied 
by nuclear magnetic resonance," Isotopes in Environmental and Health Studies. 
Vol. 34. pp. 233-243. 

Martin, M.L. and Martin, G.J., 1999, "Site-specific isotope effects and origin inference," 
5 Analysis . Vol. 27, p. 209-213. 

Martin T.R.. Galli SJ.. Katona I.M.and Drazen J.M.. 1989. "Role of mast-cells in 

anaphylaxis - evidence for the importance of mast-cells in the carcllopulmonary 

alterations and death induced by antMGE in mice", Journal of Clinical 

Investigation, Vol. 83. pp 1375-1383. 
10 Mazzucchelli L, Hauser C, Zgraggen K., Wagner H.E.. Hess M.W., Laissue JA and 

Mueller C. 1996, ''Differential in situ expression of the genes encoding the 

chemokines mcp-1 and rantes in human inflammatory bowel disease", Joumal of 

Pathologv Vol. 178. 201-206. 
Mcllvain, H.E., McKinney, M.E., Thompson. A.V. and Todd, G.L., 1992, "Application of 
1 5 the MRFIT smoking cessation program to a healthy, mixed-sex sample," Am. J. 

Prev. Med.. Vol. 8, pp. 165-170. 
Moka, D., et al., 1998, "Biochemical classification of kidney carcinoma biopsy samples 

using magic angle spinning NMR spectroscopy." J. Pharm. Biomed. Anal. . Vol. 

17, pp. 125-132. 

20 Morvan, D., Jehenson, P., Duboc, D., and Syrota, A.. 1990. "Discriminant factor-analysis 
of P-31 MMR spectroscopic data in myopathies." Maon. Reson. Med.. Vol. 13, 
pp. 216-227. 

Multiple Risk Factor Inten^ention Trial (MRFIT) Research Group, 1986, "Relationship 
between baseline risk factors and coronary heart disease and total mortality in 
25 the Multiple Risk Factor Intervention Trial, " Prev. Med.. Vol. 1 5, pp. 254-273. 

Nicholson. J.K. et al., 1989, "High resolution proton magnetic resonance spectroscopy of 

biological fluids," Prog. NMR Spectrosc. Vol. 21 . pp. 449-501 . 
Nicholson. J.K. et al.. 1995. "750 MHz ^H and ^H -"C NMR spectroscopy of human 
blood Plasma." Analytical Chemistry. Vol. 67. pp. 793-811. 
30 Nicholson. J.K., et al., 1999, "Metabonomics - understanding the metabolic responses of 
living systems to pathophysiological stimuli via multivariate statistical analysis of 
biological NMR spectroscopic data." Xenobioti(^. Vol. 29, pp. 1181-1189. 
Nillson, NJ., 1965, Leamlna Machines. McGraw-Hill, New Yoric 
Ogata H.. Takeya M., Yoshimura T., Takagi K. and Takahashi K. 1997, "The role of 
35 monocyte chemoattractant protein-1 (mcp-1) in the pathogenesis of collagen- 

induced arthritis In rats", Joumal of Patholoov Vol. 182, pp106-114. 



wo 02/086501 



PCT/GB02/01862 



-126- 

Parzen. E., 1962. "On estimation of a probability density funcUon and mode," Ann. 

Mathemai Stat. Vol. 33, p. 1065-1076. 
Patterson. D., 1996, Artificial Neural Networks. Prentice Hall. Singapore. 
Plump. AS.. Smith, J.D./Hayek. T.. Aalto-Setala, K.. Walsh, A., Verstuft, J.G.. Rubin 

E.M. & Breslow, J.L. 1992, "Severe hypercholesterolemia and atherosclerosis in 

apolipoproteinE deficient mice created by homologous recombination in ES 

cells," CeM. Vol. 71. pp. 343-353. 
Press, William H., Teukolsky. Saul A., Vetterling, William T., Rannery, Brian P., January 

■•993, Numerical R ecioes in C : The Art of Scientific Comouttng. 2nd edition. 

Cambridge University Piess. 
Quinlan, J.R., 1986, "Induction of decision fa-ees." Machine Learning. VoL 1, pp. 81-106. 
Ross. R.. 1999, "Mechanisms of disease - Atherosclerosis - An Inflammatory disease," 

The New England Journal of Medicine. Vol. 340, pp. 115-126. 
Sach M.. Bauenneister K.. Burger J.. Loetscher P.. Eisner J., Schollmeyer P, and Dobos 

G., 1997, "Inverse mcp-1/il-8 ration in effluents of CAPD patients witii peritonitis 

and In Isolated cultured human peritoneal macrophages", Neohrologv. Dialysis 

and Transplantation, Vol. 12. pp 315-320. 
Sjostrom. M., Wold, S., and Soderstrom, B., 1986. TLS Discriminant Plots," 

ProceedinQs of PARC in Practice. Amsterdam, June 19-21, 1985, Elsevier 

Science Publishers B.V., North Holland. 
Somorjal, R.L. Nikulin, AE., Pizzi. N., Jackson. D.. Scarth, G., Dolenko, B., Gordon. H.. 

Russell. P., Lean, C.L. Delbridge. L, Mountford, C.E.. and Smith, I.C.P., 1995, 

"Computerized consensus diagnosis - a classification strategy for Uie robust 

analysis of MR spectra .1. application to H-l spectra of thyrokl neoplasms," 

Magn. Reson. Med,, Vol. 33, pp. 257-263. 
Speckt, D.F., 1990, "Probabilistic Neural Networics." Neur. Networics . Vol. 3, pp. 109-118. 
Spraul, M. et al., 1994, "Automatic reduction of NMR spectroscopic data for statistical 

and pattern recognition classification of samples." J. Phann. Biomed. Anal.. Vol. 

12, pp. 1215-1225. 

Stable. L, and Wold, S.. 1987. "Partial Least Squares Analysis wjtti Cross-Validation for 
the Two-Class Problem: A Monte Cario Study," Joumal of Chemometrics. Vol. 1 , 
pp. 185-196. 

Stoyanova, R.. Kuesel. A.C.. and Brown, T.R.. 1995, "Application of principal-component 
analysis for NMR spectral quantitation." J. Maon.Resoa. Series A VoL 115. 
pp. 265-269. 



wo 02/086501 



PCT/GB02/01862 



-127- 

Sugiyama Y., Kasahara T., Mukaida N., Matsushima K. and Kitamura S., 1995, 

"Chemoidnes In bronchoalveolar lavage fluid in summer-type hypersensitivity 
pneumonitis", European Respiratory Journal . Vol. 8, pp 1084-1090. 

Sun, J.. 1997. "Statistical analysis of NIR data: data pretreatment." Joumal of 
5 Chemometrics . Vol. 1 1 , pp. 525-532. 

Sze. D.Y., et al., 1994. "High-resolution proton NMR studies of lymphocyte extracts," 
Immunomethods. Vol. 4, pp. 113-126. 

Tomlins, AM. et a!.. 1998, "High resolution magic angle spinning ^H NMR analysis of 
intact prostatic hyperplastic and tumour tissues." Anal. Comm.. Vol. 35. pp. 113- 
10 115. 

Tranter, G.E., et al., 1999, "Metabonomic prediction of drug toxicity via probabilistic 

neural network analysis of NMR biofluid data." Abstr. 9"* North American ISSX 

Meeting. Oct 24-28, 1999, p. 246. 
Volejnikova S., Laskari M., Marks jr. S.C., and Graves D.T., 1997, "Monocyte recruitment 
15 and expression of monocyte chemoattractant protein- 1 are developmentally 

regulated in remodeling bone in the mouse", American Journal of Pathology. Vol 

150, pp 1711-1721. 

Wassemnan, P.D., 1989, Neural Computino: Theory and Practice . (Van Nostrand. ed) 
Reinhold. NewYori<, USA. 
20 Weber, O.M.. Due, CO.. Meier. D.. and Boesiger. P., 1998, "Heuristic optimization 
algorithms applied to the quantification of spectroscopic data," Maon. Reson. 
Med.. Vol. 39, pp. 723-730. 
Westerhuis, J, A., de Jong, S., Smilde, A. K., 2001 , "Direct orthogonal signal conrection," 
Chemometrics and Intelligent Laboratorv Systems. Vol. 56, pp. 13-25. 
25 Wise, B. M., Gallagher, N. B., 2001, http://wAW.elgenveclor.com/MATL^ 

Wold, H., 1966, in Multivariate Analysis (P.R. Krishnaiah, Ed.) Academic Press, New 
Yoric. 

Wold, S.. 1976, "Pattern recognition by means of disjoint principal components models," 
Pattem Recog.. Vol. 8, pp. 127-139. 
30 Wold, S., Antli, H.. Lindgren, F.. and Ohman, J.. 1998a. "Orthogonal Signal Correction of 
Near-Infrared Spectra." Chemometrics and Intelligent Laboratorv Systems. Vol. 
44. pp. 175-185. 

Wold. S.. Kettaneh, N., Friden. H., and Holmberg, A.. 1998b, "Modelling and Diagnostics 
of Batch Processes and Analogous Kinetic Experiments," Chemometrics and 
35 Intelligent Laboratory Systems. Vol. 44, pp. 331-340. 



wo 02/086501 

PCT/GB02/D1862 

-128- 

Yokode. M., Hammer, R.E., Ishibashi, S., Brown, M.S. & Goldstein, J.L.. 1990. "Diet- 
induced hyperclioiesterolemia in mice: prevention by over-expression of LDL 
receptors." Science. Vol. 250, pp. 1273-1275. 

Zeyneloglu H.B., Seli E, Senturk L.IW.,GutienBZ LS.,Olive D.L. and Arici A, 1998, The 
effect of monocyte chemotactic protein 1 1n intraperitoneal adhesion fonnation in 
a mouse modeT, American Journal of Obstetrics and Gynecology. Vol. 179, pp 
438-443. 

Zheng M.H., Fan Y. Smith A, WysocW S., Papadlmitriou J.M., Wood D.J.. 1998, "Gene 
expression of monocyte ctiemoattractant protefn-1 in giant cell tumors of bone 
osteoclastoma: possible involvement in cd68* macrophage-like cell migration", 
Journal of C ellular BiochemfetrY , Vol 70, pp 121-129. 



wo 02/086501 



-129- 



PCT/GB02/01862 



CLAIMS 

1 . A method of classifying a sample, said method comprising the step of relating 
5 NMR spectral intensity at one or more predetermined diagnostic spectral 

windows for said sample with a predetemnined condition associated with low 
bone mineral density. 

2. A method, according to claim 1 , of classifying a sample from a subject, said 
10 method comprising the step of relating NMR spectral intensity at one or more 

predetermined diagnostic spectral windows for said sample with a predetermined 
condition associated with low bone mineral density of said subject. 

3. A method, according to claim 1 . of classifying a sample, said method comprising 
15 the step of relating NMR spectral intensity at one or more predetermined 

diagnostic spectral windows for said sample with the presence or absence of a 
predetermined condition associated with low bone mineral density. 

4. A method, according to claim 1, of classifying a sample from a subject, said 
20 method comprising the step of relating NMR spectral intensity at one or more 

predetermined diagnostic spectral windows for said sample with the presence or 
absence of a predetermined condition associated with low bone mineral density 
of said subject. 

25 5. A method, according to daim 1, of classifying a sample, said method comprising 
the step of relating a modulation of NMR spectral intensity, relative to a control 
value, at one or more predetennined diagnostic spectral windows for said sample 
with a predetermined condition assodated with low bone mineral density. 

30 6. A method, according to daim 1 , of classifying a sample from a subject, said 

method comprising the step of relating a modulation of NMR spectral intensity, 
relative to a control value, at one or more predetermined diagnostic spectral 
windows for said sample with a predetermined condition assodated with low 
bone mineral density of said subject. 

35 
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A method, according to daim 1, of classifying a sample, said method comprising 
the step of relating a modulation of NMR spectral intensity, relative to a control 
value, at one or more predetemriined diagnostic spectral windows for said sample 
with the presence or absence of a predetennined condition associated with low 
bone mineral density. 

A method, according to claim 1. of classftying a sample from a subject, said 
method comprising the step of relating a modulation of NMR spectral Intensity, 
relative to a control value, at one or more predetermined diagnostic spectral 
windows for said sample with the presence or absence of a predetermined 
condition associated witti low bone mineral density of said subject. 



9. A method of classifying a subject, said metfiod comprising the step of relating 
NMR spectral intensity at one or more predetennined diagnostic spectral 
windows for a sample from said subject witti a predetennined condition 
assodated with low bone mineral density of said subject. 

10. A mettiod, according to dalm 9, of dasslfying a subject, said mefliod comprising 
tile step of relating NMR spedral Intensity at one or more predetennined 
diagnostic spectral windows for a sample from said subject with the presence or 
absence of a predetennined condition assodated witti low bone mineral density 
of said subject. 

11. A mettiod, according to dalm 9. of dasslfying a subjed. said mettiod comprising 
the step of relating a modulation of NMR spectral Intensity, relative to a control 
value, at one or more predetennined diagnostic spectral windows for a sample 
from said subjed with a predetennined condition assodated wftti low bone 
mineral density of said subject. 

12. A mettiod, according to dalm 9, of classifying a subject, said method comprising 
ttie step of relating a modulation of NMR spectral intensity, relative to a control 
value, at one or more predetennined diagnostic spectral windows for a sample 
from said subjed wfth ttie presence or absence of a predetennined condition 
assodated witti low bone mineral density of said subjed. 
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* * * 



13. A method of diagnosing a predetermined condition associated with low bone 
5 mineral density of a subject, said method comprising the step of relating NMR 

spectral intensity at one or more predetemiined diagnostic spectral windows for a 
sample from said subject with said predetennined condition of said subject 

14. A method, according to claim 1 3, of diagnosing a predetemiined condition 

10 associated with low bone mineral density of a subject, said method comprising 

the step of relating NMR spectral Intensity at one or more predetennined 
diagnostic spectral windows for a sample from said subject with the presence or 
absence of said predetemiined condition of said subject. 

15 15. A method, according to claim 1 3, of diagnosing a predetennined condition 

associated vwth low bone mineral density of a subject, said method comprising 
the step of relating a modulation of NMR spectral intensity, relative to a control 
value, at one or more predetermined diagnostic spectral windows for a sample 
from said subject with said predetermined condition of said subject. 

20 

1 6. A method, according to claim 1 3. of diagnosing a predetennined condition 

associated with low bone mineral density of a subject, said method comprising 
the step of relating a modulation of NMR spectral intensity, relative to a control 
value, at one or more predetennined diagnostic spectral windows for a sample 
25 firom said subject with the presence or absence of said predetemiined condition 

of said subject. 



30 17. A method of classifying a sample, said method comprising the step of relating the 
amount of. or relative amount of one or more diagnostic species present In said 
sample with a predetennined condition associated with low bone mineral density. 



18. 

35 



A method, according to claim 17, of classifying a sample from a subject, said 
method comprising the step of relating the amount of, or relative amount of one 
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or more diagnostic species present in said sample with a predetennined condition 
associated with low bone mineral density of said subject. 

19. A method, according to claim 17, of classifying a sample, said method comprising 
the step of relating the amount of, or relative amount of one or more diagnostic 
species present m said sample with the presence or absence of a predetennined 
condition assodated with low bone mineral density. 

20. A method, according to daim 17. of dassifying a sample from a subject, said 
method comprising the step of relating the amount of. or the relative amount of, 
one or more diagnostic species present in said sample with the presence or 
absence of a predetennined condition assodated with low bone mineral density 
of said subject. 

21 . A method, according to daim 17. of dassifying a sample, said method comprising 
the step of relating a modulation of the amount of. or relative amount of one or 
more diagnostic spedes present in said sample, as compared to a control 
sample, with a predetennined condition assodated with low bone mineral density. 

22. A method, according to claim 17, of classifying a sample from a subject, said 
method comprising the step of relating a modulation of the amount of. or relafive 
amount of one or more diagnostic spedes present in said sample, as compared 
to a control sample, with a predetermined condition assodated with low bone 
mineral density of said subject 

23. A method, according to daim 17, of dassifying a sample, said method comprising 
the step of relating a modulation of the amount of, or relative amount of one or 
more diagnostic species present in said sample, as compared to a control 
sample, with the presence or absence of a predetennined condition associated 
with low bone mineral density. 

24. A method, according to daim 1 7. of classifying a sample from a subjed, said 
method comprising the step of relating a modulation of the amount of. or relative 
amount of one or more diagnostic species present in said sample, as compared 
to a control sample, with the presence or absence of a predetennined condition 
assodated with low bone mineral density of said subject 
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25. A method of classifying a subject, said method comprising the step of relating the 
amount of, or relative amount of one or more diagnostic species present in a 
sample from said subject with a predetermined condition associated with low 
bone mineral density of said subject. 

26. A method, according to claim 25. of classifying a subject, said method comprising 
the step of relating the amount of, or relative amount of one or more diagnostic 
species present in a sample from said subject with the presence or absence of a 
predetennined condition associated with low bone mineral density of said subject. 

27. A method, according to claim 25, of dassi^ing a subject, said method comprising 
the step of relating a modulation of the amount of, or relative amount of one or 
more diagnostic species present in a sample from said subject, as compared to a 
control sample, with a predetemiined condition associated with low bone mineral 
density of said subject. 

28. A method, according to claim 25, of classifying a subject, said method comprising 
the step of relating a modulation of the amount of, or relative amount of one or 
more diagnostic species present in a sample firom said subject, as compared to a 
control sample, witii the presence or absence of a predetemiined condition 
associated with low bone mineral density of said subject 



29. A method of diagnosing a predetemiined condition associated with low bone 
mineral density of a subject, said method comprising the step of relating the 
amount of, or relative amount of one or more diagnostic species present in a 
sample from said subject witii said predetemiined condition of said subject. 

30. A method, according to claim 29, of diagnosing a predetermined condition 
associated with low bone mineral density of a subject, said method comprising 
the step of relating Uie amount of. or relative amount of one or more diagnostic 
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species present in a sample from said subject with the presence or absence of 
said predetermined condition of said subject. 

31 . A method, according to daim 29. of diagnosing a predetermined condition 
associated with low bone mineral density of a subject, said method comprising 
the step of relating a modulation of the amount of, or relative amount of one or 
more diagnostic species present in a sample from said subject, as compared to a 
control sample, with said predetermined condition of said subject. 

32. A method, according to claim 29, of diagnosing a predetennined condition 
associated with low bone mineral density of a subject, said method comprising 
the step of relating a modulation of the amount of, or relative amount of one or 
more diagnostic species present in a sample from said subject, as compared to a 
control sample, with the presence or absence of said predetennined condition of 
said subject. 



* * * 



I A method of classification, said method comprising the steps of: 

(a) fomriing a predictive mathematical model by applying a modelling 
method to modelling data; 

(b) using said model to classify a test sample. 

A method, according to claim 33, of classifying a test sample, said method 
comprising the steps ot 

(a) fonning a predictive mathematical model by applying a modelling 
method to modelling data; 

wherein said modelling data comprises a plurality of data sets for 
modelling samples of known dass; 

(b) using said model to dassify said test sample as being a member of 
one of said known dasses. 

A method, according to daim 33. of dassifying a test sample, said method 
comprising the steps of. 

(a) fonning a predidive mathematical model by applying a modelling 
method to modelling data; 
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wherein said modelling data comprises at least one data set for each of a 
plurality of modelling samples; 

wherein said modelling samples define a class group consisting of a 
plurality of classes; 

5 wherein each of said modelling samples is of a known class selected from 

said class group; and, 

(b) using said model vnth a data set for said test sample to classify said 
test sample as being a member of one class selected from said class group. 

A method of classification, said method comprising the step of: 
using a predictive mathematical model; 
wherein said model is fornied by applying a modelling method to 
modelling data; 

to classify a test sample. 

A method, according to claim 36. of classifying a test sample, said method 
comprising the step of: 

using a predictive mathematical model; 
wherein said model is fomied by applying a modelling method to 
modelling data; 

wherein said modelling data comprises a plurality of data sets for 
modelling samples of known class; 

to classify said test sample as being a member of one of said known 
classes. 

A method, according to daim 36, of classifying a test sample, said method 
comprising the step of: 

using a predictive mathematical model; 
wherein said model is fonned by applying a modelling method to 
modelling data; 

wherein said modelling data comprises at least one data set for each of a 
plurality of modelling samples; 

wherein said modelling samples define a class group consisting of a 
plurality of classes; 

wherein each of said modelling samples is of a known dass selected from 
said dass group; 
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with a data set for said test sample to classify said test sample as being a 
member of one class selected from said class group. 



* * * 



A method of classification, said method comprising the steps of: 

(a) fbmrilng a predictive mathematical model by applying a modelling method to 
modelling data; 

(b) using said model to classify a subject. 

A method, according to claim 39, of classifying a subject, said method comprising 
the steps of: 

(a) fonning a predictive mathematical model by applying a modelling 
•method to modelling data; 

wherein said modelling data comprises a plurality of data sets for 
modelling samples of known class; 

(b) using said model to classify a test sample from said subject as being a 
member of one of said known classes, and thereby classify sard subject. 

A method, according to claim 39, of classifying a subject, said method comprising 
the steps of: 

(a) fonning a predictive mathematical model by applying a modelling 
method to modelling data; 

wherein said modelling data comprises at least one data set for each of a 
plurality of modelling samples; 

wherein said modelling samples define a dass group consisting of a 
plurality of classes; 

wherein each of said modelling samples is of a known class selected from 
said dass group; and, 

(b) using said model with a data set for a test sample fix)m said subject to 
classify said test sample as being a member of one dass selected from said 
dass group, and thereby dassify said subject 



A method of classification, said method comprising the step of: 
using a predictive mathematical model; 
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wherein said model is formed by applying a modelling method to 
modelling data; 

to dasstfy a subject. 

5 43. A method, according to claim 42. of classifying a subject said method comprising 
the step of: 

using a predictive mathematical model 

wherein said model is formed by applying a modelling method to 
modelling data; 

10 wherein said modelling data comprises a plurality of data sets for 

modelling samples of known class; 

to classify a test sample from said subject as being a member of one of 
said known classes, and thereby classify said subject. 

15 44. A method, according to claim 42, of classifying a subject, said method comprising 
the step of: 

using a predictive mathematical model, 
wherein said model is fonned by applying a modelling method to 
modelling data; 

20 wherein said modelling data comprises at least one data set for each of a 

' plurality of modelling samples; 

wherein said modelling samples define a class group consisting of a 
plurality of classes; 

wherein each of said modelling samples is of a known class selected from 
25 said dass group; 

with a data set for a test sample from said subject to classify said test 
sample as being a member of one dass selected from said dass group, and 
thereby classify said subject. 



30 



45. A method of diagnosis, said method comprising the steps of: 

(a) forming a predictive mathematical model by applying a modelling 
method to modelling data; 
35 (b) using said model to diagnose a subjed. 
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46. A method, according to daim 45, of diagnosing a predetermined condition 
associated with low bone mineral density of a subject, said method comprising 
the steps of: 

(a) forming a predictive mathematical model by applying a modelling 
method to modelling data; 

wherein said modelling data comprises a plurality of data sets for 
modelling samples of known class; 

(b) using said model to classify a test sample from said subject as being a 
member of one of said known classes, and thereby diagnose said subject 

47. A method, acconJing to claim 45, of diagnosing a predetemiined condition 
associated with low bone mineral density of a subject, said method comprising 
the steps of: 

(a) forming a predictive mathematical model by applying a modelling 
method to modelling data; 

wherein said modelling data comprises at least one data set for each of a 
plurality of modelling samples; 

wherein said modelling samples define a class group consisting of a 
plurality of classes; 

wherein each of said modelling samples Is of a known class selected from 
said dass group; and, 

(b) using said model with a data set for a test sample from said subject to 
classify said test sample as being a member of one dass selected from said 
dass group, and thereby diagnose said subject. 

48. A method of diagnosis, said method comprising the step of: 
using a predictive mathemafical model; 
wherein said model Is fomried by applying a modelling method to 
modelling data; 

to diagnose a subject. 



A method, according to daim 48, of diagnosing a predetermined condition 
assodated with low bone mineral density of a subject, said method comprising 
the step of: 

using a predictive mathematical model; 
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wherein said model is formed by applying a modelling method to 
modelling data; 

wherein said modelling data comprises a plurality of data sets for 
modelling samples of known class; 
5 to classify a test sample from said subject as being a member of one of 

said known classes, and thereby diagnose said subject. 

50. A method, according to daim 48, of diagnosing a predetemnlned condition 

associated with low bone mineral density of a subject, said method comprising 
10 the step of: 

using a predictive mathematical model; 
wherein said model is formed by applying a modelling method to 
modelling data; 

wherein said modelling data comprises at least one data set for each of a 
1 5 plurality of modelling samples; 

wherein said modelling samples define a class group consisting of a 
plurality of classes; 

wherein each of said modelling samples is of a known class selected from 
said class group; 

20 with a data set for a test sample from said subject to classify said test 

sample as being a member of one class selected from said class group, and 
thereby diagnose said subject 



25 



30 



51 . A method according to any one of claims 1 to 50, wherein said test sample is a 
test sample from a subject, and said predetermined condition is a predetermined 
condition of said subject. 



52. A method according to any one of daims 1 to 50, wherein said "a modulation of 
is "an increase or decrease in." 



35 



* * * 
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53. A method according to any one of claims 1 to 52, wherein said relating step 
involves the use of a predictive mathematical model. 

54. A method according to any one of claims 1 to 52, wherein said modelling method 
5 is a multivariate statistical analysis modelling method. 

55. A method according to any one of claims 1 to 52, wherein said modelling method 
is a multivariate statistical analysis modelling method which employs a patlem 
recognition method. 

10 

56. A method according to any one of claims 1 to 52, wherein said modelling method 
is, or employs PCA. 

57. A method according to any one of claims 1 to 52. wherein said modelling method 
15 is, or employs PLS. 

58. A method according to any one of claims 1 to 52, wherein said modelling method 
is, or employs PL8-DA. 

20 59. A method according to any one of claims 1 to 58. wherein said modelling method 
Includes a step of data filtering. 

60. A method according to any one of claims 1 to 58, wherein said modelling method 
includes a step of orthogonal data filtering. 



25 



61 . A method according to any one of claims 1 to 58. wherein said modelling method 
includes a step of OSC. 



30 

62. A method according to any one of claims 1 to 61 . wherein said model takes 
account of one or more diagnostic species. 



63. 

35 



A method according to any one of daims 1 to 62, wherein said modelling data 
comprise spectral data. 
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64. A method according to any one of claims 1 to 62, wherein said modelling data 
comprise both spectral data and non-spectral data. 

65. A method according to any one of claims 1 to 62, wherein said modelling data 
5 comprise NMR spectral data. 

66. A method according to any one of claims 1 to 62, wherein said modelling data 
comprise both NMR spectral data and non-NMR spectral data. 

10 67. A method according to any one of claims 1 to 62, wherein said NMR spectral 
data comprises NMR spectral data and/or NMR spectral data. 

68. A method according to any one of claims 1 to 62, wherein said NMR spectral 
data comprises NMR spectral data. 

15 

69. A method according to any one of claims 1 to 62. wherein said modelling data 
comprise spectra. 

70. A method according to any one of claims 1 to 62, wherein said modelling data are 
20 spectra. 

71 . A method according to any one of claims 1 to 70. wherein said modelling data 
comprises a pluraltty of data sets for modelling samples of known dass. 

25 72. A method according to any one of claims 1 to 70, wherein said modelling data 
comprises at least one data set for each of a plurality of modelling samples. 

73. A method according to any one of claims 1 to 70, wherein said modelling data 
comprises exactly one data set for each of a plurality of modelling samples. 

30 

74. A method according to any one of claims 1 to 70, wherein said using step is: 
using said model with a data set for said test sample to classify said test sample 
as being a member of one class selected from said class group. 

35 75. A method according to any one of claims 1 to 74. wherein each of said data sets 
comprises spectral data. 
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A method according to any one of claims 1 to 74, wherein each of said data sets 
comprises both spectral data and non-spectral data. 

A method according to any one of claims 1 to 74, wherein each of said data sets 
comprises NMR spectral data. 



78. A method according to any one of claims 1 to 74. wherein each of said data sets 
comprises both NMR spectral data and non-NMR spectral data. 

10 

79. A method according to any one of claims 1 to 74, wherein said NMR spectral 
data comprises NMR spectral data and/or "C NMR spectral data. 

80. A method according to any one of claims 1 to 74, wrfierein said NMR spectral 
15 data comprises NMR spectral data. 

81 . A method according to any one of claims 1 to 74, wherein each of said data sets 
comprises a spectrum. 

20 82. A method according to any one of claims 1 to 74, wherein each of said data sets 
comprises a 'H NMR spectrum and/or "C NMR spectrum. 

83. A method according to any one of datms 1 to 74, wherein each of said data sets 
comprises a NMR spectrum. 

25 

84. A method according to any one of claims 1 to 74, wherein each of said data sets 
is a spectrum. 

85. A method according to any one of claims 1 to 74, wherein each of said data sets 
30 Is a NMR spectrum and/or "C NMR spectaim. 

86. A method accofding to any one of claims 1 to 74, wherein each of said data sets 
Is a NMR spectrum. 



35 87. 



A method according to any one of claims 1 to 86, wherein said non-spectral data 
is non-spectral clinical data. 
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88. A method according to any one of claims 1 to 86, wherein said non-NMR spectral 
data is non-spectral clinical data. 



89. A method according to any one of claims 1 to 88. wherein said class group 
comprises classes associated with said predetennined condition. 

10 90. A method according to any one of claims 1 to 88, wherein said dass group 
comprises exactly two classes. 

91 . A method according to any one of claims 1 to 88, wherein said class group 

comprises exactly two classes: presence of said predetermined condition; and 
1 5 absence of said predetermined condition. 



92. A method according to any one of claims 1 to 91 , wherein said sample is an in 
20 vivo sample. 

93. A method according to any one of claims 1 to 91 , wherein said sample is an ex 
vivo sample. 

25 94. A method according to any one of claims 1 to 91 . wherein said sample is a blood 
sample or a blood-derived sample. 

95. A method according to any one of claims 1 to 91, wherein said sample is a blood 
sample. 

30 

96. A method according to any one of claims 1 to 91 . wherein said sample is a blood 
plasma sample. 



97. 

35 



A method according to any one of claims 1 to 91 , wherein said sample is a blood 
serum sample. 
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98. A method accortlng to any one of claims 1 to 97. wherein 



animal, 



said subject is an 



A method according to any one of claims 1 to 97. wherein said subject is a 



mammal. 



100. Anm^ ,„ ^ ^ , ^ ^ ^ ^ ^ ^ 



^0 human. 



102. A meu.od aocortht, to an, on. of da,™ , ,o ,00. whe,* said one o, mc, 
diagnostic spectral Windows. 

103. A method according to any one of claims 1 to 100. wherein 

of diagnostic spectral windows, and. ^ 
so«* °^ -""^ Predetem,ined diagnostio 



105. 

35 
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106. A method according to any one of claims 1 to 104. v\^erein at least one of said 
one or more predetermined diagnostic spectral windows encompasses a 
chemical shift value for an NMR resonance of a diagnostic species. 

5 1 07. A method according to any one of claims 1 to 1 04, each of a plurality of said one 
or more predetermined diagnostic spectral windows encompasses a chemical 
shift value for an NMR resonance of a diagnostic species. 

108. A method according to any one of claims 1 to 104, each of said one or more 

1 0 predetermined diagnostic spectral windows encompasses a chemical shift value 

for an NMR resonance of a diagnostic species. 

109. A method according to any one of claims 106 to 108, wherein said NMR 
resonance is a NMR resonance. 

15 

110. A method according to any one of claims 1 to 109, wherein said one or more 
diagnostic species are endogenous diagnostic species. 

111. A method according to any one of claims 1 to 1 1 09, wherein said one or more 
20 diagnostic species are associated with NMR spectral intensity at predetermined 

diagnostic spectral windows. 

112. A method according to any one of claims 1 to 1 1 1 , said one or more diagnostic 
species are a plurality of diagnostic species. 



25 



30 



113. A method according to any one of claims 1 to 1 1 1 , said one or more diagnostic 
species is a single diagnostic species. 



114. A method according to any one of claims 1 to 113, wherein said classification is 
perfomied on the basis of an amount, or a relative amount, of a single diagnostic 
species. 
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115. A method according to any one of claims 1 to 1 1 3, wherein said dassffication is 
perfonned on the basis of an amount, or a relative amount, of a plurality of 
diagnostic species. 



118. A method according to any one of claims 1 to 1 1 3. wherein said classification is 
performed on the basis of an amount, or a relative amount, of each of a plurality 
of diagnostic species. 

117. A method acconJing to any one of claims 1 to 1 1 3, wherein said classification is 
perfomiied on the basis of a total amount, or a relative total amount, of a plurality 
of diagnostic species. 

1 18. A method according to any one of claims 1 to 1 1 3, wherein: 

said one or more diagnostic species Is: a plurality of diagnostic species; 

and, 

said amount of, or relative amount of one or more diagnostic species is: a 
comblnaSoh of a plurality of amounts, or relative amounts, each of which is the 
amount of, or relative amount of one of said plurality of diagnostic species. 

119. A method according to claim 118. wherein said combination is a linear 
combination. 



* * * 



120. A method according to any one of claims 1 to 1 19, wherein said predetemiined 
diagnostic spectral windows are defined by one or more index values, 6r, 
corresponding to the bucfcet regions listed in Table 1-OP and/or Table 2-OP. 

121 . A method according to any one of claims 1 to 1 1 9, wherein at least one of said 
one or more predetemiined diagnostic species is a species described in Table 
1-OP an&or Table 2-OP. 



122. 



A method according to any one of dalms 1 to 1 19, wherein each of a plurality of 
said one or more predetermined diagnostic spedes is a spedes described in 
Table lOP and/or Table 2-OP. 
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123. A method according to any one of claims 1 to 1 19, wherein each of said one or 
more predetermined diagnostic species is a species described in Table 10P 
and/or Table 2-OP. 



124. A method of identifying a diagnostic species, or a combination of a plurality of 
diagnostic species, for a predetermined condition associated with low bone 
mineral density, said method comprising the steps of. 

(a) applying a multivariate statistical analysis method to experimental 

data; 

wherein said experimental data comprises at least one data comprising 
experimental parameters measured for each of a plurality of experimental 
samples; 

wherein said experimental samples define a class group consisting of a 
plurality of classes; 

wherein at least one of said plurality of classes is a class associated with 
said predetermined condition, e.g., a class associated with the presence of said 
predetemiined condition; 

wherein at least one of said plurality of classes is a class not associated 
with said predetennined condition, e.g., a dass associated with the absence of 
said predetermined condition; 

wherein each of said experimental samples is of known class selected 
from said class group; 

and: 

(b) identifying one or more critical experimental parameters; 
wherein each of said critical experimental parameters is statistically 

30 significantly different for classes of said class group, e.g., is statistically significant 

for discriminating between classes of said dass group; and, 

(c) matching each of one or more of said one or more critical experimental 
parameters with said diagnostic species; 

35 or: 



10 
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(b) identifying a combination of a plurality of critical experimental 
parameters; 

wherein said combination of a plurality of critical experimental parameters 
Is statistically significantly different for classes of said class group, e.g.. is 
5 statistically signfficant for discriminating between classes of said class group; 

and, 

(c) matching each of one or more of said plurality of critical experimental 
parameters with said combination of a plurality of diagnostic species. 

10 125. A method, according to daim 1 24, wherein: 

one or more of said critical experimental parameters is a spectral 
parameter; and 

said identifying and matching steps are: 

(b) identifying one or more critical experimental spectral parameters; and, 

(c) matching each of one or more of said one or more critical experimental 
spectral parameters with a spectral feature, e.g.. a spectral peak; 

and matching one or more of said spectral peaks with said diagnostic 
species; 



or 



(b) identifying a combination of a plurality of critical experimental spectral 
parameters; and, 

(c) matching each of a plurality of said plurality of critical experimental 
spectral parameters with a spectral feature, e.g.. a spectral peak; 

and matching one or more of said spectral peaks with said combination of 
a plurality of diagnostic species. 

L A method according to any one of claims 124 to 125, wherein said multivariate 
statistical analysis method is a multivariate statistical analysis method which 
employs a pattern recognition method.. 

. A method according to any one of claims 124 to 126, wherein said multivariate 
statistical analysis method is, or employs PCA. 
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128. A method according to any one of claims 124 to 126. wherein said multivariate 
statistfcal analysis method is, or employs PLS. 

129. A method according to any one of claims 124 to 126, wherein said multivariate 
5 statistical analysis method is, or employs PLS-DA. 

130. A method according to any one of claims 124 to 129, wherein said multivariate 
statistical analysis method includes a step of data filtering. 

10 131 . A method according to any one of claims 124 to 129, wherein said multivariate 
statistical analysis method includes a step of orthogonal data filtering. 

132. A method according to any one of claims 124 to 129. wherein said multivariate 
statistical analysis method includes a step of OSC. 

15 

133. A method according to any one of claims 124 to 132, wherein said experimental 
parameters comprise spectral data. 

134. A method according to any one of claims 124 to 132, wherein said experimental 
20 parameters comprise both spectral data and non-spectral data. 

1 35. A method according to any one of claims 124 to 132, wherein said experimental 
parameters comprise NMR spectra! data. 

25 136. A method according to any one of claims 124 to 132, wherein said experimental 
parameters comprise both NMR spectral data and non-NMR spectral data. 

137. A method according to any one of claims 124 to 136. wherein said NMR spectral 
data comprises NMR spectral data and/or ^^C NMR spectral data. 

30 

138. A method according to any one of claims 124 to 136, wherein said NMR spectral 
data comprises NMR spectral data. 



139. 

35 



A method according to any one of claims 124 to 138, wherein said non-spectral 
data is non-spectral clinical data. 
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140. A method according to any one of claims 1 24 to 138, wherein said non-NiWR 
spectral data is non-spectral clinical data. 

141 . A method according to any one of claims 124 to 140, wherein said critical 
experimental parameters are spectral parameters. 

142. A method according to any one of claims 1 24 to 141 . wherein said class group 
comprises classes associated with said predetennined condition. 

143. A method according to any one of claims 124 to 142. wherein said dass group 
comprises exactly two classes. 

144. A method according to any one of claims 1 24 to 142. wherein said class group 
comprises exactly two classes: presence of said predetemiined condition; and 
absence of said predetemiined condition. 

145. A method according to any one of claims 124 to 142, wherein said class 
associated with said predetennined condition is a dass assodated with the 
presence of said predetenmined condition. 

146. A method according to any one of daims 124 to 142. wherein said dass not 
assodated with said predetennined condition is a class assodated with the 
absence of said predetennined condition. 

147. A method according to any one of daims 124 to 146, said method further 
comprising the additional step ot 

(d) confinning the identity of said diagnostic species. 



148. A computer system or device, such as a computer or linked computere, 

operafively configured to implement a method according to any one of daims 1 to 



147. 



149. 



Computer code suitable for implementing a method according to any one of 
daims 1 to 147 on a suitable computer system. 
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150. A computer program comprising computer program means adapted to perform a 
method according to according to any one of claims 1 to 147, when said program 
is mn on a computer 

151. A computer program according to claim 1 50, embodied on a computer readable 
medium. 

152. A data earner which canies computer code suitable for implementing a method 
1 0 accofding to any one of claims 1 to 147 on a suitable computer. 

1 53. Computer code and/or computer readable data representing a predictive 
mathematical model as described in any one of claims 1 to 147. 

15 1 54- A data carrier which canies computer code and/or computer readable data 

representing a predictive mathematical model as described in any one of claims 1 
to 147. 

1 55. A computer system or device, such as a computer or linked computers, 
20 programmed or loaded with computer code and/or computer readable data 

representing a predictive mathematical model as described in any one of claims 1 
to 147. 

1 56. A system comprising: 

25 (a) a first component comprising a device for obtaining NMR spectral 

intensity data for a sample; and, 

(b) a second component comprising computer system or device, such as 
a computer or linked computers, operatively configured to implement a method 
according to any one of claims 1 to 147, and operatively linked to said first 

30 component. 



157. 

35 



A diagnostic species identified by a method according to any one of claims 124 to 
147. 
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158. A diagnostic species identified by a method according to any one of claims 124 to 
147 for use in a method of classfflcation. 



1 59. A method of classification which employs or relies upon one or more diagnostic 
species identified by a method according to any one of claims 124 to 147. 

1 60. Use of one or more diagnostic species identified by a method of classification 
according to any one of claims 124 to 147. 

1 61 . An assay for use in a method of classification, which assay relies upon one or 
more diagnostic species Identified by a method according to any one of claims 
124 to 147. 

162. Use of an assay in a method of classification, which assay relies upon one or 
more diagnostic species Identified by a method according to any one of claims 
124 to 147. 



1 63. A method of therapeutic monitoring of a subject undergoing therapy which 
employs a method of classification according to any one of claims 1 to 123. 



1 64. A method of evaluating dmg therapy and/or dmg efficacy which employs a 
method of classification according to any one of claims 1 to 123. 
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Figure IB-OP 
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Figure 1F-0P 
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LACK OF UNITY OF INVENTION 

The present application does not meet the requirements of Rule 13 PCT for the following 
reasons: 

1 . Claims 1 and 9 specify: 

- a method of classifying a sample comprising the step of relating NI\/IR spectral 
intensity at one or more predetermined diagnostic spectral windows for said sample 
with low bone mineral density. 

2. Claim 13 specifies: 

- a method of diagnosing a predetemnined condition associated with low bone mineral 

density of a subject 
[not a method of classifying]. 

3. Claim 17 specifies: 

- a method of classifying a sample comprising the step of relating the [relative] amount 
of one or more diagnostic species present in said sample with low bone mineral 
density 

[not Iinl<ed to the step of relating NMR spectral intensity at one or more predetermined 
diagnostic spectral windows for said sample witfi low bone mineral density]. 

4. Claim 25 specifies: 

- a method of classifying a subject comprising the step of relating the [relative] amount 
of one or more diagnostic species present in a sample from said subject with low bone 
mineral density of said subject 

[not linked to the step of relating NMR spectral intensity at one or more predetenmined 
diagnostic spectral windows for said sample with low bone mineral density]. 

5- Claim 29 specifies: 

- a method of diagnosing a predetermined condition associated with low bone mineral 
density of a subject comprising the step of relating the [relative] amount of one or 
more diagnostic species present in a sample from said subject with a predetermined 
of said subject 
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[not linked to the step of relating NMR spectral intensity at one or more predetermined 
diagnostic spectral windows for said sample with low bone mineral density]. 

6. Claim 33 specifies: 

- a method of classification comprising the steps of: 

- forming a predictive mathematical model ... 

- using said model to classify a test sample 

[not linked to the step of relating NMR spectral intensity at one or more predetermined 
diagnostic spectral windows for said sample with low bone mineral density]. 

7- Claim 36 specifies: 

- a method of classification comprising the step of: 

- using a predictive mathematical model formed by applying a modelling method to 
modelling data 

[not linked to the step of relating NMR spectral intensity at one or more predetermined 
diagnostic spectral windows for said sample with low bone mineral density]. 

8. Claim 39 specifies: 

- a method of classification comprising the steps of: 

- forming a predictive mathematical model by applying a modelling method to 
modelling data ... 

[not linked to the step of relating NMR spectral Intensity at one or more predetermined 
diagnostic spectral windows for said sample with low bone mineral density]. 

9. Claim 42 specifies: 

- a method of classification comprising the step of: 

- using a predictive mathematical model formed by applying a modelling method to 
modelling data 

[not linked to the step of relating NMR spectral intensity at one or more predetermined 
diagnostic spectral windows for said sample with low bone mineral density]. 

10. £laiaL4& specifies: 
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- a method of diagnosis comprising the steps of: 

- forming a predictive mathematical model formed by applying a modelling method to 
modelling data 

[not linked to the step of relating NMR spectral intensity at one or more predetermined 
diagnostic spectral windows for said sample with low bone mineral density]. 

1 1 . Claim 48 specifies: 

- a method of diagnosis comprising the steps of: 

- fomiing a predictive mathematical model formed by applying a modelling method 
to modelling data 

[not linked to the step of relating NMR spectral intensity at one or more predetermined 
diagnostic spectral windows for said sample with low bone mineral density]. 

12. Claim 1 24 specifies: 

- a method of identifying a diagnostic species ... comprising the steps of: 

- appl^ng a multivariate statistical analysis method to experimental data ... 

[not linked to the method of classifying a sample and from the step of relating NMR 
spectral intensity at one or more predetermined diagnostic spectral windows for said 
sample with low bone mineral density]. 

1 3. Claim 148 specifies: 

- a computer system .. operatively configured to implement a method according to 
claims 1 to 147 

[since the different methods spedfied in the claims mentioned above are not linked by 
a single inventive concept, also the features of the computers implementing these 
methods cannot be linked by a single Inventive concept]. 

14. Claim 149 specifies: 

- computer code suitable for implementing a method according to claims 1 to 147 
[since the different methods specified in the claims mentioned above are not linked by 
a single inventive concept, also the computer codes implementing these different 
methods cannot be linked by a single inventive concept]. 
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1 5. Claim 150 specifies: 

- a computer program comprising computer means adapted to perform a metfiod 
according to claims 1 to 147 

[since tiie different methods specified in the claims mentioned above are not linked by 
a single inventive concept, also the features of the computer programs performing these 
methods cannot be linked by a single inventive concept]. 

16. Claim 15g specifies: 

- a data carrier canying computer code for implementing a method according to 
claims 1 to 147 

[since the different methods specified in the claims mentioned above are not linked by 
a single inventive concept, also the features of the data carrier carrying computer code 
for implementing these methods cannot be linked by a single inventive concept]. 

17. Claim 153 specifies: 

- computer code representing a predictive mathematical model as described in any 
one of claims 1 to 147 

[since the different predictive mathematical models specified in the claims mentioned 
above are not linked by a single Inventive concept, also the features of the computer 
code representing these models cannot be linked by a single Inventive concept]. 

1 8. Claim 154 specifies: 

- a data canrier canrying computer code representing a predictive mathematical 
model as described in any one of claims 1 to 147 

[since the different predictive mathematical models specified in the claims mentioned 
above are not linked by a single inventive concept, also the features of the data carrier 
canrying these models cannot be finked by a single inventive concept]. 

19. Claim 155 specifies! 

- a computer system programmed with computer code representing a predictive 
mathematical model as described in any one of claims 1 to 147 
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[since the different predictive mathematical models specified in the claims mentioned 
above are not linked by a single inventive concept, also the features of the computer 
system programmed with computer code representing the methods / models of claims 
1 to 164 cannot be linked by a single inventive concept]. 

20. Claim 156 specifies: 

- a system comprising 

- a first component comprising a device for obtaining NMR spectral intensity data 

for a sample; 

- a second component comprising a computer system operatively configured to 
implement a method according to any one of claims 1 to 147 

[the above arguments with respect to claim 1 65 also apply here]. 

21. Claim 157 specifies: 

- a diagnostic species identified by a method according to any one of claims 1 36 to 
147 

[since the methods specified in claims 136 to 164 are not linked by a single inventive 
concept, also the corresponding diagnostic species cannot be finked by a single inventive 
concept; furthemnore, a diagnostic species cannot be linked by a single inventive concept 
to the computer or to the data carrier specified in claim 154]. 

22. Claim 158 specifies: 

- a diagnostic species identified by a method according to any one of claims 1 36 to 
1 47 for use in method of classification 

[since the methods spedfied in claims 136 to 164 are not finked by a single inventive 
concept, also the corresponding diagnostic spedes cannot be linked by a single inventive 
concept; furthemnore, a diagnostic spedes cannot be linked by a single inventive concept 
to the computer or to the data carrier specified in claim 154]. 

23. Claim 159 soecifies: 

- a method of dassification which employs one or more diagnostic spedes identified 
by a method of classification 
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[not linked to the step of relating NMR spectral Intensity at one or more predetermined 
diagnostic spectral windows for said sample with low bone mineral densltyj. 

24. Claim 160 specifies: 

- use of one or more diagnostic species 

[not linlced to the step of relating NMR spectral intensity at one or more predetermined 
diagnostic spectral windows for said sample with low bone mineral densityj. 

26. filainUM specifies: 

- an assay for use in a method of classification, which assay relies upon one or 
more diagnostic species identified by a method according to any one of claims 1 24 
to 147 

[since the methods specified in claims 124 to 147 are not linked by a single inventive 
concept, also the con^esponding assays cannot be linked by a single inventive concept; 
furthenmore, an assay cannot be linked by a single Inventive concept to the computer or 
data earner specified In claim 1 54]. 

26- ClaiOLieg specifies: 

- use of an assay for use in a method of classification, which assay relies upon one 
or more diagnostic species Identified by a method according to any one of claims 
124 to 147 

[since the methods specified In claims 124 to 147 are not linked by a single inventive 
concept, also the oon^esponding assay uses cannot be linked by a single Inventive 
concept; furthenmore, the use of an assay cannot be linked by a single Inventive concept 
to a method of classification]. 

27. filaiaUfia specifies: 

- a method of therapeutic monitoring ... 
[such a method cannot be linked by a single Inventive concept to a method of 
classification]. ^ 

28. Claim 164 specifier; 
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- a method of evaluating drug therapy ... 
[such a method cannot be linked by a single inventive concept to a method of 
classification]. 



Since the subject-matter of these claims is not linked by a single general Inventive concept, 
there are 28 different inventions specified In the independent claims. Therefore, the application 
does not meet the requirements of Rule 13 PCT. 

Only the subject-matter of the invention first mentioned in claims 1-12 has been subjected 
to a search. A search for each of the other 27 inventions can be earned out upon your request. 
For each additional invention a standard search fee will be due. 
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